The most detailed and grounded overview in the history of autonomous driving decision control and motion planning!

/ Guide /

My direction is Decision Making And Motion Planning (DMAP) for autonomous driving. I previously posted a review (holding the boss’s thigh) in an SCI journal. I read hundreds of articles in SCI (the last one I used was also More than 100, it can be said that I have walked a lot of twists and turns), thinking about the difficult and directionless feeling when I started, and there is a lack of simple, easy-to-understand and comprehensive analysis of this knowledge on the Internet. I think it can help me Share what you have learned for your reference.

It should be noted that these things can be obtained from the Internet or in the literature. The advantage of understanding information from this blog is just to save you a lot of time to find documents and sort out your context , Instead of aimless random literature), sort out what the decision-making and planning directions are. And I will talk about many of the more common algorithms such as AD. I will try my best to adhere to the vernacular style. Those who like scientific research style can directly read the literature**.

1

Introduction

Figure 1: Three types of perception decision-making control methods currently existing in autonomous vehicles

The decision control methods shown in Figure 1 from top to bottom are sequential planning, behavior-aware planning, and end-to-end planning. Sequential planning is the most traditional method, and the three levels of perception, decision-making and control are relatively clear.

Compared with the first method, behavior-aware planning has the highlight of introducing man-machine co-driving, vehicle-road coordination, and vehicle risk estimation of the external dynamic environment; end-to-end planning is based on DL and DRL technologies, Relying on a large amount of data for training, it is one of the most popular methods to obtain the relationship from image and other perception information to steering wheel angle and other vehicle control inputs.

This article will briefly introduce the content of the above three methods, the problems encountered and the results achieved. The second section will introduce sequential planning. In this section, the sensory control process of the autonomous vehicle will be described in accordance with the entire decision-making control sequence. The third section introduces end-to-end autonomous driving, the fourth section introduces behavior-aware planning decision-making control methods, and the fifth section introduces safety verification methods. Finally, we will briefly summarize the problems to be solved in the previous article.

1

route plan

Figure 2: The sequential planning process

Figure 2 shows the sequential planning process, which is briefly summarized as path planning -> decision process -> vehicle control. The path planning described in this summary belongs to the first and third steps.

On the problem of generating trajectories for unmanned vehicles, there are two methods: direct trajectory generation and path-speed decomposition. Compared with the first method, path-speed is less difficult and therefore more commonly used. Path planning can be divided into four categories: sampling-based algorithms represented by PRM and RRT, search-based algorithms represented by A* and D*, and trajectory generation algorithms based on interpolation fitting represented by β-spline curves , And the optimal control algorithm for local path planning represented by MPC. This section will explain one by one in the above order:

Figure 3: The rendering of the path planning algorithm

Table 2: Advantages and disadvantages of path planning algorithms

2.1 Algorithm based on sampling

1. Construction of basic algorithms PRM and RRT

Figure 4: PRM renderings

(1) Construction of PRM[1]

1) Preprocessing stage: sample n points uniformly and randomly in the safe area in the state space, and each sampling point is connected to the adjacent sampling point within a certain distance, and the trajectory that collides with the obstacle is discarded, and finally a connection is obtained Figure

2) Query phase: For a given pair of initial and target states, connect them to the already constructed graph, and then use the search algorithm to find a trajectory that meets the requirements

It is easy to see that once a PRM is constructed, it can be used to solve the motion planning problem of different initial and target states, but this feature is unnecessary for unmanned vehicle motion planning. In addition, PRM requires precise connections between states, which is very difficult for motion planning problems with complex differential constraints.

(2) Construction of RRT[2]

1) Initialization of the tree: initialize the node set and edge set of the tree, the node set only contains the initial state, and the edge set is empty

2) Tree growth: Randomly sample the state space. When the sampling point falls in the safe area of ​​the state space, select the node closest to the sampling point in the current tree and expand (or connect) it to the sampling point. If the generated trajectory does not collide with the obstacle, add the trajectory to the edge set of the tree, and add the end point of the trajectory to the node set of the tree

3) Repeat step 2) until it expands to the target state set. Compared with the undirected graph of PRM, RRT builds a tree structure with the initial state as the root node and the target state as the leaf node. For different initial and For the target state, different trees need to be constructed. In addition, RRT does not require precise connections between states, and is more suitable for solving motion dynamics problems such as unmanned vehicle motion planning.

2. Problems and solutions based on sampling algorithms

Almost all algorithms have several same problems-solving efficiency and whether the solution is optimal. The reason why PRM and RRT have probabilistic completeness is that they traverse almost all positions in the configuration space, as shown in e in Figure 2.

In terms of improving solution efficiency, the core idea of ​​optimizing RRT is to guide the tree to an open area, that is, to stay away from obstacles as much as possible, to avoid repeated inspections of nodes at obstacles, to improve efficiency. The specific methods are as follows:

(1) Uniform sampling

The standard RRT algorithm samples the state space uniformly and randomly. The probability that the nodes in the current tree get expanded is proportional to the area of ​​the Voronoi area, so the tree will grow toward the open area of ​​the state space and evenly fill the free area of ​​the state space.

Literature [3] proposes the RRT-connect algorithm, which constructs two trees starting from the initial state and the target state at the same time. When the two trees grow together, a feasible solution is found. Go-biaing [4] refers to inserting the target state in a certain proportion in the random sampling sequence, guiding the tree to expand to the target state, speeding up the solution, and improving the quality of the solution.

Literature [5] proposed a heuristic RRT algorithm (heuristic RRT), which uses a heuristic function to increase the probability that nodes with low expansion costs are sampled. This function requires calculating the cost of each node in the tree, but in a complex environment, the cost The definition of functions is often difficult.

In order to solve this problem, the literature [6] proposed the f-biased sampling method, which first discretized the state space into a grid, and then calculated the cost on each grid using the Dijkstra algorithm, and the value of the point in the area where the grid is located. Are equal to this value to construct a heuristic function.

(2) Optimize distance measurement

Distance is used to measure the cost of the path (trajectory) between two configurations (states) in the configuration space (state space). This cost can be understood as the length of the path, the energy consumed or the time spent, and the distance is measured The reason is to assist in generating heuristic cost functions to guide the direction of the tree. However, in actual operation, considering obstacles, the difficulty of distance calculation is very high. The definition of distance in motion planning adopts a definition similar to Euclidean distance.

Reference [7] gives the weighted distance expression of the configuration space, that is, according to the coordinates and states of the two configuration points, and assign weights to them, to construct the weighted distance expression. However, this distance expression does not completely reflect the distance between the two configurations, so it brings a large error.

Literature [8] proposed that RG-RRT (rechability guided RRT) can eliminate the influence of inaccurate distance on RRT exploration ability. RG-RRT calculates the reachable set of the node in the tree. When the distance from the sampling point to the node is greater than the distance from the sampling point to the reachable set of the node, the node may be selected for expansion

(3) Reduce the number of collision checks

Collision checking is one of the efficiency bottlenecks of sampling-based algorithms. The usual method is to discretize the path at equal distances, and then check the configuration at each point.

Literature [9] proposed RC-RRT (resolution complete RRT) to reduce the probability of nodes close to obstacles gaining expansion. The specific method is to discretize the input space. For a node, a certain input can only be used once; If the trajectory corresponding to an input collides with an obstacle, a penalty value is added to the node. The higher the penalty value, the smaller the probability that the node will be expanded.

Literature [10] and Literature [11] proposed DD-RRT (dynamic domain RRT) and ADD-RRT (adaptive dynamic domain RRT) respectively, limiting the sampling area in the local space where the current tree is located to prevent the nodes close to obstacles from repeating Expansion failure, improve algorithm efficiency

(4) Improve real-time

Anytime RRT [12] is a high real-time algorithm. It first quickly constructs an RRT, obtains a feasible solution and records its cost. After that, the algorithm will continue to sample, but only insert the nodes that help reduce the cost of feasible solutions into the tree, so as to gradually obtain better feasible solutions. Replanning [13] is another real-time planning algorithm, which decomposes the entire planning task into a number of equal time subtask sequences, and plans the next task while executing the current task.

(5) There are mainly the following methods to solve the optimality problem:

Literature [14-16] improved the standard PRM and RRT based on the theory of random geometric graphs, and obtained PRM*, RRG and RRT* algorithms with asymptotically optimal properties. Randomly sample n points in the state space and connect the points whose distance is less than r(n) to form a random geometric graph (RGG). Asymptotic optimality requires r(n) to satisfy:

r (n) = γ 〖(log⁡n / n) ^ ^ (1 / d)

Where: γ is related to the specific environment; d is the dimension of the state space. According to this concept, the standard PRM and RRT are modified to obtain the PRM* and RRG algorithms. The "reconnect" step is introduced on the basis of RRG, that is, to check whether the newly inserted node as the parent node of its neighboring point will reduce the cost of the neighboring point, if it is reduced, remove the original parent-child relationship of the neighboring point, and insert the current Point as its parent node, this is the RRT* algorithm.

A large number of node connections and local adjustments make the efficiency of PRM* and RRT* very low. Literature [17] proposed the LBT-RRT algorithm, which combined the RRG and RRT* algorithms. Literature [18] made improvements on the basis of PRM*, greatly reducing the number of node connections, and obtaining the weakest asymptotic minimum. Superiority, both methods can achieve higher efficiency under the premise of obtaining progressive optimality.

Efficiency and optimality cannot be optimized at the same time. If the ratio of the algorithm to the choice between the two is taken as a parameter, if this parameter is dynamic, the final effect of the algorithm will be much better (think of Kalman filtering according to the accelerometer and The gyroscope controls the angle of the upright car body, and assigns the weight of the measured value and the predicted value according to the size of the noise. The final effect is good), but whether this idea can be used to write the dynamic weight distribution of efficiency and optimality is a problem, now I feel I can't write, my thoughts are a bit messy, and I will consider this issue when my ability improves in the future.

The calculation of the algorithm increases due to the search for the optimal solution. When improving its efficiency, in addition to the method mentioned in the previous section, there are the following methods: first use RRT* to find a feasible solution, and then here Expand the solution on the basis, such as literature [19][20].

2.2 Search-based algorithm

Another category of algorithms to solve the problem of motion planning is the heuristic search algorithm. The basic idea is to discretize the state space into a graph in a certain way, and then use various heuristic search algorithms to search for feasible solutions or even optimal solutions. This type of algorithm has analytical completeness and even analytical optimality. This type of algorithm is now relatively mature. The approximate algorithm classification is shown in Figure 1.

Figure 5: Schematic diagram of the status grid

The basis of search-based algorithms is the state grid. The state grid [21] is a means of discretizing the state space. The state grid consists of a node (representing a state) and a motion primitive starting from the node to the adjacent node. Composition, a state node can be transformed to another state node through its motion primitive.

In this way, the state grid converts the original continuous state space into a search graph, and the motion planning problem becomes a search for a series of motion primitives that transform the initial state to the target state in the graph. After constructing the state grid, it can be used Graph search algorithm to search for optimal trajectory.

1. Construction of basic algorithms Dijkstra and A*

Dijkstra algorithm [22] traverses the entire configuration space, finds the distance between every two grids, and finally chooses the shortest path from the starting point to the target point. Its breadth-first nature leads to low efficiency. Add inspiration to this algorithm. The formula function, that is, the distance from the searched node to the target node, and searching again based on this can avoid the inefficiency caused by the global search. This is the A* algorithm [23], as shown in the figure below, the red is the search area.

Figure 6: A* and Dijkstra algorithm effect comparison chart

2. Problems and suggestions faced by search-based algorithms

Similar to sampling-based algorithms, this type of algorithm also needs to be optimized for efficiency and optimality.

In terms of improving efficiency, A* itself is a static programming algorithm. For the extension of A* algorithm, weighted A* [24] further guides the search direction to this target node by increasing the weight of the heuristic function. The search speed is very fast. But it is easy to fall into a local minimum and cannot guarantee the global optimal solution

For moving vehicles, the use of A*'s derivative algorithm D* (dynamic A*) [25] can greatly improve efficiency. Also based on dynamic programming is LPA* [26], which can handle state grids The cost of motion primitives is a time-varying situation. When the environment changes, a new optimal path can be planned by re-searching a small number of nodes, and D*-Lite is developed on the basis of LPA*[27] The same results as D* can be obtained, but with higher efficiency.

When searching for the optimal solution, ARA*[28] is a search algorithm with Anytime property developed on the basis of Weighted A*. It calls the Weighted A* algorithm multiple times and reduces the heuristic function with each call. Weight, so that the algorithm can quickly find a feasible solution, by introducing the set INCONs so that each cycle can continue to use the information of the previous cycle, optimize the path, and gradually approach the optimal solution.

On the issue of taking into account the efficiency and optimality of the algorithm, Sandin Aine et al. proposed the MHA* algorithm [29], which introduces multiple heuristic functions to ensure that one of the heuristic functions can find the optimal solution when used alone, thus passing Coordinating the path costs generated by different heuristic functions can balance the efficiency and optimality of the algorithm. DMHA*[30] generates suitable heuristic functions online and in real time on the basis of MHA*, so as to avoid the local minimum problem.

A* has some other variant algorithms, M*[31] is an algorithm dedicated to multi-robot motion planning, R*[32] is an algorithm derived from A* combined with random sampling, which can avoid local optimization to a certain extent .

2.3 Algorithm based on interpolation fitting

The algorithm based on interpolation fitting can be defined as: according to a series of known point sets used to describe the road map, by using data interpolation and curve fitting to create the path that the smart car will travel, this path can provide comparison Good continuity, high conductivity. The specific method is as follows:

Dubins curve [33] and Reeds and Sheep (RS) curve [34] are the shortest paths connecting any two points in the configuration space, corresponding to the cases of no reversing and with reversing, respectively. They are all composed of arcs of maximum curvature and straight lines. There is a discontinuity of curvature at the junction of the arc and the straight line. When the actual vehicle drives along such a curve, the steering wheel must be stopped at the discontinuous curvature to continue driving.

The curvature of the cyclotron [35] is proportional to the length of the curve, that is, the curvature of the path has a linear relationship with the length of the curve. It can be used as a transition curve from a straight line to a circular arc, thereby transforming the Dubins curve and the RS curve to achieve continuous curvature The most representative one is CC-Steer [36], which is suitable for motion planning at low speeds.

Polynomial interpolation curve [37][38][39] is the most commonly used method. It can set the polynomial coefficients by meeting the requirements of the nodes and obtain better continuous derivability. Fourth-order polynomials are often used in longitudinal Constraint control, fifth-order polynomials are often used in lateral constraint control [37], and third-order polynomials are also used in overtaking trajectories [40].

The spline curve has a closed expression, it is easy to ensure the curvature continuity. The β-spline curve [41] can realize the continuity of curvature, and the cubic Bezier curve [42] can ensure the continuity and boundedness of the curvature, and the amount of calculation is relatively small. The η^3 curve [43] is a seven-degree spline curve, which has very good properties: the continuity of curvature and the continuity of curvature derivatives, which are very meaningful for high-speed vehicles.

2.4 Algorithm based on optimal control

The algorithm based on optimal control is included in path planning, mainly because the MPC can perform local path planning to avoid obstacles. In addition, the main function of MPC is to track trajectory, except for In addition to the necessary dynamics and kinematics constraints, the future should also consider comfort, the uncertainty of perception information [45], the uncertainty of workshop communication [46], and the driver can also be included in the control during local trajectory planning Closed loop [38][44]. The uncertainties mentioned above and the inclusion of the driver in the closed-loop control will be discussed in Section 4.

There are many kinds of prediction models used in MPC: such as convolutional neural networks, fuzzy control, state space, etc., among which the state space method is the most used. MPC can be briefly expressed as: Under the condition that the necessary dynamics, kinematics, etc. are satisfied, the model is solved by numerical means (usually numerical means, because the model is too complicated, and traditional analysis methods such as variational methods are no longer applicable) The optimal solution is the control variable of the state equation, such as steering wheel angle, etc., and the control variable is applied to the car model to obtain the required state variables, such as speed, acceleration, coordinates, etc.

From the above description, we can see that the key of MPC lies in the establishment of the model and the solution of the model. How to equivalently simplify the establishment of the model and improve the efficiency of the solution is the top priority.

In order to reduce the difficulty of modeling, literature [47][48][49] uses artificial potential energy field model for modeling. The basic idea of ​​artificial potential energy field is similar to electric field, and obstacles on the road are similar to electric fields with different electric charges from the field source. The specific effect of the polar charge is shown in the figure below.

Figure 7: Schematic diagram of potential energy field

Literature [50-51], [15-18] proposed ways to improve the efficiency of MPC solving.

1

End To End DMAP

As shown in Figure 1, all the methods introduced above belong to the first type, the traditional method of perception, decision-making, and control layering, and the third end to end method is the same as shown in the figure, "one step in place" ".

The end-to-end approach usually uses deep learning or deep reinforcement learning. There are two types of mapping between input and output in the end to end method. One is the input image and other perceptual information output to the output steering wheel angle and other control variables, and the other is the perceptual information to the state of the car model. Such as speed, coordinates, etc., both of which require a large amount of data to support. Usually when doing network training, the image will be randomly translated and rotated to expand the data

Caltagirone constructed a convolutional neural network in [52] to construct a mapping from lidar point cloud information and GPS-IMU inertial navigation information to path planning. This method can automatically label the collected data (supervised learning), And train once a week.

NVIDIA's research institute [53] achieved a direct mapping from the front-end camera information of the vehicle to the steering wheel angle by building a deep convolutional neural network. After experimental verification, this method can better pass gravel roads, construction roads, and smart cars. You can drive at night when the light is bad.

In [54], Gurghian obtained detailed and clear road information through cameras installed on both sides of the vehicle, and obtained the lateral position of the vehicle on the road in an end-to-end manner. Similarly, Chen used neural networks in [55] to obtain state variables with strong interpretability such as heading angle and coordinates. Both of the above end to end methods need to send the output result to the controller for vehicle control.

Xu used large-scale driving video data in [56] to train a fully convolutional long short-term memory network. The output of this method can be discrete behaviors such as turning left and right, or it can be such as steering wheel angle. Continuous behavior.

Literature [57] proposed a socially conscious obstacle avoidance method based on deep reinforcement learning. This method can directly learn socially conscious behaviors in mutiagent scenes by constructing a symmetric neural network.

There is always a gap between the virtual scene and the actual scene. Therefore, autonomous vehicles trained through the virtual scene need to make strategies to deal with this gap. For this reason, it is necessary to estimate the uncertainty of the neural network (Bayesian network). However, Recent ensemble, Bootstrap and Monte Carlo dropout methods cannot accurately give the magnitude of uncertainty. For this reason, the literature [58] proposed a method to evaluate the uncertainty and choose to cancel the end to end control when the risk is high, and switch to the traditional hierarchical control.

When training the reinforcement learning network, it is impossible to perform real vehicle training. Generally, simulation tests are performed. Through simulation tests, not only a large amount of collision data can be obtained, but also the actual relationships in the virtual scene can be obtained to facilitate the training of the reward function.

Wolf used Deep-Q-Network in [59] to obtain the mapping of perceptual information to the steering wheel. During the test, they found that when describing the position information of the vehicle on the road, the offset angle of the vehicle to the road centerline was used. The amount can improve the overall effect of the algorithm. Literature [60] proposed a method to improve the realism of virtual scenes. The images in the simulation equipment are first input into the segmentation network for classification, and then the results are generated into a scene closer to reality through the generative network.

For continuous space, Lillicrap proposed in [61] based on DRL network, actor-critic and model-free algorithm training using gradient strategy, this algorithm achieved good results in simulation testing.

1

Behavior-Aware Motion Planning

Behavior-Aware Motion Planning is called Behavior-Aware Motion Planning in Chinese, as shown in the second line in Figure 1. Compared with the two other features of this method, the decision-making planning process is upgraded to an interactive process, including driver-driving vehicle and driving vehicle-external environment. The purpose of studying this method is to incorporate the uncertainty of the external environment into the decision-making plan to improve the driving safety of autonomous vehicles. In the research process, the auxiliary blessing of V2X technology will not be considered.

This section will describe the four types of methods used in the research: cooperation and interaction, games-theoretic approaches, probalistic approach, partially observable markov decision process (hidden Markov), learning based approaches (machine learning).

4.1 Cooperation And Interaction

When the self-driving vehicle encounters an emergency, or the situation becomes very urgent, the vehicle will automatically stop. This behavior is called freezing-robot problem [62]. If the vehicle does not stop but continues to move forward, then It is very likely to encounter the danger of collision.

When dealing with the uncertainty when the vehicle encounters the above problems, the following three methods are usually selected:

(1) Perform better dynamic modeling of the external environment. Literature [63] incorporates the predicted future state of the external environment into the model to reduce uncertainty. However, literature [62] points out that even if it is completely Knowing the state of movement of external things still cannot completely prevent the freezing-robot problem from occurring.

(2) When modeling, the reaction of external things to the vehicle is regarded as a constraint [64]. However, this method needs to assume that external things can be fully predicted and known. Intuitively, the excessive trust model will also bring a huge blow (then Why does trust need uncertainty?).

(3) The vehicle and external things are regarded as a whole, so that they have the same distribution, such as probability distribution [62] and value function distribution.

The document [65] gives the definition of the cooperation and interaction method, and the document [66], from the perspective of collaboration, divides this kind of cooperative vehicle behavior into two dimensions of communication and cooperation between vehicles.

4.2 Game-Theoretic Approaches

In the control algorithm, it is usually assumed that external things, such as other vehicles, are controlled in accordance with the method of minimizing their cost function, and the method of evaluating their benefits is a cost function or a reward function or a utility function. However, in addition to the above control methods, the other is based on the maximum likelihood estimation or the maximum posterior probability to find the extreme value of a certain function. The same is to find the best value of the evaluation function, but the two methods are different.

For example: The second method first takes an action when encountering a foreign vehicle, and then models the foreign vehicle, and maximizes its reward value according to its behavior. This method has a similar disadvantage to (2) mentioned in cooperation and interaction, that is, the process of modeling also represents the indirect control process of external vehicles.

When building an interaction model, the complexity of the algorithm increases exponentially with the number of agents, so improving the efficiency of the solution becomes more important. The easiest way is to discretize the action space according to the movement of the agents, and then search the entire Space to obtain optional actions [65]. There are many efficient methods for searching this kind of data space. Literature [67] proposes a tree-type structure search method, but the size of this tree also increases exponentially with agents.

In order to obtain a more efficient algorithm, literature [68] proposed a Monte Carlo search tree search method. Literature [69] assumes that the behavior of the following vehicle is determined by the vehicle it follows, and the algorithm based on this assumption has a quadratic relationship between the complexity of the algorithm and the number of agents.

In the literature [70], the idea of ​​Stackelberg game is integrated into the modeling process. The idea is: the action taken by the leader vehicle is based on the worst behavior of the following vehicle. Here the leader and following are only relative relations, and it is implemented for each vehicle. this method. Compared with the decision tree method, the responsibility of this algorithm is only linear with the number of agents, but in fact the decision tree algorithm is better than this method.

4.3 Probalilistic Approaches

This method is the same as literally. In fact, several other methods are more or less based on probability to build algorithms.

In [71], Wei proposed an algorithm based on the Markov decision process for the merging of vehicles at the entrance of a highway. This algorithm provides a feasible set of policies for the two vehicles to merge. The two vehicles are based on their own The cost function is just to select the optimal policy from the set. In this method, Wei chooses the Bayes model to model social behavior. For example, if a car accelerates, it means it wants to rush in, and deceleration means it gives way.

Werling also proposed a path planning method for confluence junctions in [72]. When modeling the reaction of the environment, you can use the intelligent driver model, which is a continuous vehicle following model for urban traffic and highways. This model describes the longitudinal position and speed of a bicycle in the traffic flow from a microscopic perspective. At the same time, it is also because it considers the state of other vehicles, such as acceleration, and adds it to its own cost function, to obtain a certain degree of coordination.

Hoermann used particle filtering in literature [73] to estimate the behavior parameters of the model, such as maximum acceleration, expected acceleration, etc. The posterior probability density obtained according to this method will be used for state update.

Dong used a probability map to estimate the behavior of external vehicles in [74]. This method is relatively simple to describe, which is to generate the behavior with the greatest probability based on the information it observes.

Literature [62] uses Gaussian process to predict the trajectory of external agents

4.4 Partially Observable Markov Decision Processes

The above is abbreviated as POMDP, which is a branch of the probilistic method. The hidden Markov decision process can take the intention of other agents as its own hidden variable. Literature [75] proposes POMDP that uses the driving intention of other vehicles as a hidden variable. POMDP also contains the trade-off between exploration (information acquisition process) and exploitation (progress to goal). This method is used in literature [76] to realize the driving of unmanned vehicles on urban roads.

The POMDP community pays attention to offline modeling. Offline means wanting to obtain the best model, but this makes the algorithm time-consuming to run, which is unacceptable for autonomous vehicles. In order to improve efficiency, POMDP can only predict the current reachable state, and only plan a rough trajectory when planning a trajectory instead of planning a detailed trajectory.

Reference [77] applies tree-based method for policy evaluation. Reference [78] proposes a real-time multi-strategy estimation rolling time domain control for urban roads. It predicts the trajectory of other objects through POMDP and combines nonlinear MPC planning. Out of the safe path.

Brechtel proposed a method of balancing exploration and exploitation in response to the problem of incomplete perceptual information in [79].

Literature [80] proposed a parameter-free reinforcement learning to quickly obtain the approximate optimal solution of PODMP, but the generalization ability of this method is relatively poor.

4.5 Learning-Based Approaches

This is a method based on machine learning. The deep learning and deep reinforcement learning mentioned earlier will not be mentioned here.

Vallon used SVM in the literature [81] to determine the lane-changing strategy. The characteristics used are relative position and relative speed. After the lane-changing strategy is determined, the local path planning will be carried out through MPC.

Lenz used a neural network to train a Gaussian mixture model in [82]. The features include the current and past state of the vehicle, and the shape of the road. The fully connected layer performs better than the recurrent neural network and also better than the intelligent driver model mentioned in 4.3.

Reference [83] simulates the driving mode of expressway, and its algorithm shows the great potential of the MDP decision-making process. Reference [84] further improves the algorithm quality on its basis.

Inverse reinforcement learning IRL is also called inverse optimal control. It can provide a reward function for reinforcement learning that is usually difficult to express, and it is not prone to overfitting. On this basis, RL is looking for the optimal policy.

Ziebart used IRL for decision control in literature [85]. Maxmium-entroy IRL is a well-known method. In literature [86][87][88], this method was used to conduct sports that consider social consciousness and human performance. Planning, literature [89] uses it for adaptive navigation with priority.

Maximun margin planning is an extension of it. Reference [90] uses this method to navigate for robots running on unstructured roads, and reference [91] uses this method to learn autonomous driving methods and strategies.

Literature [92] designed a framework that allows IRL to take the risk sensitivity of expert into consideration. The framework uses a linear algorithm to infer the risk level of expert.

Literature [93] uses maxmium-entroy deep IRL network to maximize the ability of deep fully convolutional network in describing the cost model of driving behavior.

Literature [94] proves the effectiveness of generative confrontation network and extends it to the optimization problem of cyclic strategy. The combination of IRL and RL is relatively inefficient, while the generative confrontation network can directly obtain the policy from the data. The method can guarantee to provide fast behavioral response in a longer time domain.

1

VERIFICATION AND SYHTHESIS

The algorithms generated by all the above methods need to be well evaluated for their safety. Literature [95] pointed out that evaluating the safety of a car requires millions of miles, which takes ten years to complete. For simulation testing, no matter how it is optimized, there is always a gap with real life. Therefore, a better method framework is needed to give absolute safety proof. This is the problem to be discussed in this section.

Literature [96] proposed a synthesis method for low-complexity tasks, such as adaptive cruise, bicycle on-board network, etc. However, the calculation efficiency of this method is very low.

In contrast, formal verification is more efficient. Usually in distributed systems, model checking is often used for formal verification. It checks the state space of the established model to determine whether the system meets the requirements. In the literature [97], California Institute of Technology uses this method to check the state consistency between software modules, and to check the safety of adaptive cruise.

Literature [98] uses reachability analysis to conduct online inspections when performing conservative linearization, and uses zonotopes as a set representation, and queries the database for emergency operations. Online inspection requires a probabilistic representation of external environmental things, which greatly increases the complexity of the algorithm.

Literature [99] proposes to check the executable of motion planning results through accessibility analysis.

In addition to online inspection, literature [100] proposes to establish a road model library in a certain place. This method can be better applied to the inspection of urban road controllers and networks, but there is also a disadvantage that it does not consider All possible actions.

Literature [101] solves the security verification problem of deep neural networks by using traditional methods, such as short-term memory solvers.

Literature [102] pointed out the five validation problems facing AI: modeling of complex environments; modeling of systems; detailed explanations of important attributes of the system; vectorized operations; quantitative description of the data required for training .

1

Problems to be solved

In addition to the efficiency of the algorithm and the completeness of the modeling that have always existed, the issues that need to be considered in the future also include the consideration of driving safety, as follows:

  1. The established vehicle operation model should be closer to reality. Most of the existing motorcycle models just simplified it to a 2-degree-of-freedom bicycle model, and should gradually add comfort constraints

  2. From the perspective of path planning, the driver can be included in closed-loop control through HMI

  3. Better incorporate environmental uncertainty into decision control algorithms

  4. Eternally improving algorithm solving efficiency

  5. Improve the generalization ability of the algorithm

Edit | Chen Jinglan

Proofreading | Zhong Miaoli

Audit | Huang Xiaoming

Source | know almost Author: ISmiracle

exclusive! Take a picture to understand the important representative enterprises of the latest 5G industry chain!

List of important representative companies in 70 subdivisions of chips at home and abroad!

Dry goods | 5G comprehensive basic knowledge

Complete 5G terminology!

My name is "5G Industry Circle"

Long press to scan code to follow

The latest 5G information is here!

Guess you like

Origin blog.csdn.net/uxuepai5g/article/details/108934791