Sorting out the technical framework of autonomous driving

Source of this article: Frontier of Smart Driving

a10c350b3e82930c7c250129584c99b4.jpeg

The core of the unmanned driving system can be summarized as three parts: perception (Perception), planning (Planning) and control (Control). The interaction of these parts and their interaction with vehicle hardware and other vehicles can be represented by the following diagram: 

9c7980cc6a016682f7cb192adabd2081.jpeg

Perception refers to the ability of an unmanned driving system to collect information from the environment and extract relevant knowledge from it. Among them, environmental perception (Environmental Perception) specifically refers to the ability to understand the scene of the environment, such as the location of obstacles, the detection of road signs/marks, the semantic classification of pedestrian and vehicle detection and other data. Generally speaking, localization is also part of perception, which is the ability of an unmanned vehicle to determine its position relative to the environment.

Planning is the process of unmanned vehicles making some purposeful decisions for a certain goal. For unmanned vehicles, this goal usually refers to reaching the destination from the starting point while avoiding obstacles and continuously optimizing the driving trajectory. and behavior to ensure the safety and comfort of passengers. The planning layer is usually subdivided into three layers: Mission Planning, Behavioral Planning and Motion Planning.

Finally, control is the ability of an autonomous vehicle to accurately execute planned actions derived from higher layers.

07163aac4e4fc7b6523945f001c9af3f.jpeg

perception

environmental awareness

In order to ensure the unmanned vehicle's understanding and grasp of the environment, the environment perception part of the unmanned driving system usually needs to obtain a large amount of information about the surrounding environment, specifically including: the position, speed and possible behavior of obstacles, the driving area, traffic rules and so on. Unmanned vehicles usually obtain this information by fusing data from various sensors such as Lidar, Camera, and Millimeter Wave Radar. Applications in human-vehicle perception.

Lidar is a type of device that uses laser light for detection and ranging. It can send millions of light pulses to the environment every second. Its interior is a rotating structure, which allows Lidar to build up the surrounding environment in real time. 3D map of .

Generally speaking, the lidar rotates and scans the surrounding environment at a speed of about 10Hz. The result of one scan is a 3-dimensional map composed of dense points. Each point has (x, y, z) information. This map is called Point Cloud Graph, as shown in the figure below, is a point cloud map created using Velodyne VLP-32c lidar:

a7fd93007bc73baf1b655eb12254ca0e.jpeg

Lidar is still the most important sensor in unmanned driving systems because of its reliability. However, in actual use, Lidar is not perfect, and there are often problems with point clouds that are too sparse, and even some points are lost. For irregular Lidar can be difficult to discern patterns on the surface of objects, and it can't be used in conditions such as heavy rain.

In order to understand point cloud information, generally speaking, we perform two steps on point cloud data: segmentation (Segmentation) and classification (Classification). Among them, segmentation is to cluster discrete points in the point cloud image into several wholes, and classification is to distinguish which category these wholes belong to (such as pedestrians, vehicles and obstacles). Segmentation algorithms can be classified into the following categories:

  1. Edge-based methods, such as gradient filtering, etc.;

  2. Area-based methods, this type of method uses area features to cluster neighboring points, and the basis for clustering is to use some specified criteria (such as Euclidean distance, surface normal, etc.), this type of method is usually first in the point Select several seed points in the cloud, and then use the specified criteria to cluster neighboring points from these seed points;

  3. Parameter method, this type of method uses a pre-defined model to fit the point cloud, common methods include random sample consistency method (Random Sample Consensus, RANSAC) and Hough Transform (Hough Transform, HT);

  4. Attribute-based methods, first calculate the attributes of each point, and then cluster the points associated with the attributes;

  5. graph-based approach;

  6. Machine learning-based methods;

After the target segmentation of the point cloud is completed, the segmented target needs to be correctly classified. In this link, the classification algorithm in machine learning is generally used, such as Support Vector Machine (Support Vector Machine, SVM) to classify the clustering features. Classification. Due to the development of deep learning in recent years, the industry has begun to use specially designed Convolutional Neural Network (CNN) to classify three-dimensional point cloud clustering.

However, whether it is the method of extracting features-SVM or the method of original point cloud-CNN, due to the low resolution of the lidar point cloud itself, for targets with sparse reflection points (such as pedestrians), point cloud-based classification is not effective. Reliable, so in practice, we often integrate lidar and camera sensors, use the high resolution of the camera to classify targets, use the reliability of lidar to detect and measure obstacles, and integrate the advantages of the two to complete environmental perception.

In unmanned driving systems, we usually use image vision to complete road detection and target detection on the road. The detection of roads includes the detection of road lines (Lane Detection), the detection of drivable areas (Drivable Area Detection); the detection of road signs on the road includes the detection of other vehicles (Vehicle Detection), pedestrian detection (Pedestrian Detection), traffic signs and Signal detection (Traffic Sign Detection) and other detection and classification of all traffic participants.

The detection of lane lines involves two aspects: the first is to identify the lane lines. For curved lane lines, the curvature can be calculated ; where on the line). One method is to extract some lane features, including edge features (usually gradients, such as Sobel operator), color features of lane lines, etc., use polynomial fitting to pixels that we think may be lane lines, and then based on polynomials and The current position of the camera mounted on the vehicle determines the curvature of the lane ahead and the deviation of the vehicle relative to the lane.

Drivable area detection A current method is to use a deep neural network to directly segment the scene, that is, to complete the cutting of the drivable area in the image by training a pixel-by-pixel classification deep neural network. 

The detection and classification of traffic participants currently mainly rely on deep learning models. Commonly used models include two types:

  • Region Proposal-based deep learning target detection algorithms represented by RCNN (RCNN, SPP-NET, Fast-RCNN, Faster-RCNN, etc.);

  • Regression-based deep learning target detection algorithms represented by YOLO (YOLO, SSD, etc.);

4a486bcd4e139590cfd3bdd51ffe11ff.jpeg

position

At the perception level of unmanned vehicles, the importance of positioning is self-evident. Unmanned vehicles need to know their exact position relative to the environment. The positioning here cannot have an error of more than 10cm. Just imagine, if our unmanned vehicle positioning If the error is 30 cm, then this will be a very dangerous unmanned vehicle (whether for pedestrians or passengers), because the unmanned planning and execution layer does not know that it has an error of 30 cm, they still follow the If the precise premise is used to make decisions and controls, then the decisions made for certain situations will be wrong, resulting in accidents. It can be seen that unmanned vehicles require high-precision positioning.

At present, the most widely used positioning method for unmanned vehicles is the fusion of Global Positioning System (Global Positioning System, GPS) and Inertial Navigation System (Inertial Navigation System) positioning method. Among them, the positioning accuracy of GPS is between tens of meters and centimeters. Therefore, the price of high-precision GPS sensors is relatively expensive. The GPS/IMU positioning method cannot achieve high-precision positioning when the GPS signal is missing or weak, such as underground parking lots, urban areas surrounded by high-rise buildings, etc., so it can only be applied to unmanned driving tasks in some scenarios.

The map-assisted positioning algorithm is another widely used positioning algorithm for unmanned vehicles. Simultaneous Localization And Mapping (SLAM) is a representative of this type of algorithm. The goal of SLAM is to use the map while building the map. Positioning, SLAM determines the current vehicle position and the position of the current observed features by using the observed environmental features.

This is a process of estimating the current position using previous priors and current observations. In practice, we usually use a Bayesian filter (Bayesian filter) to complete, specifically including Kalman Filter (Kalman Filter), extended Karl Mann filter (Extended Kalman Filter) and particle filter (Particle Filter).

Although SLAM is a research hotspot in the field of robot positioning, there are problems in using SLAM positioning in the actual unmanned vehicle development process. Unlike robots, the movement of unmanned vehicles is long-distance and in a large open environment. In long-distance sports, as the distance increases, the deviation of SLAM positioning will gradually increase, resulting in positioning failure.

In practice, an effective positioning method for unmanned vehicles is to change the scan matching algorithm in the original SLAM. Specifically, we no longer map while positioning, but use sensors such as lidar to construct point clouds for the area in advance. Maps add a part of "semantics" to the map through program and manual processing (such as the specific marking of lane lines, road network, traffic light position, traffic rules of the current road section, etc.). High-precision map (HD Map) for human-driven cars.

In the actual positioning, use the current laser radar scan and the pre-constructed high-precision map for point cloud matching to determine the specific position of our unmanned vehicle on the map. This type of method is collectively referred to as scan matching method (Scan Matching) , the most common scan matching method is the Iterative Closest Point (ICP), which is based on the distance measurement between the current scan and the target scan to complete point cloud registration.

In addition, the Normal Distributions Transform (NDT) is also a common method for point cloud registration, which is based on the point cloud feature histogram to achieve registration. The positioning method based on point cloud registration can also achieve positioning accuracy within 10 cm.

Although point cloud registration can give the global positioning of unmanned vehicles relative to the map, such methods rely too much on pre-constructed high-precision maps, and still need to be used with GPS positioning in open road sections. (such as highways), the method of using GPS plus point cloud matching is relatively expensive.

a40a1419ea8ae23d406ca4a8b3725505.jpeg

planning

mission planning

The hierarchical structure design of the unmanned driving planning system originated from the DAPRA Urban Challenge held in 2007. In the competition, most teams divided the planning module of unmanned vehicles into three layers: task planning, behavior planning and action planning. Among them, task planning is usually also called path planning or route planning (Route Planning), which is responsible for relatively top-level path planning, such as path selection from the starting point to the ending point. 

We can process our current road system into a directed network graph (Directed Graph Network). This directed network graph can represent various information such as roads and connections between roads, traffic rules, and road widths. The above is the "semantic" part of the high-precision map mentioned in the previous positioning section. This directed network graph is called the Route Network Graph, as shown in the following figure:

ba8a0fee75c9be259cedcb21ae84ff4a.jpeg

Each directed edge in such a road network graph is weighted. Then, the path planning problem of unmanned vehicles becomes that in the road network graph, in order for the vehicle to achieve a certain goal (usually From A to B), based on the process of selecting the optimal (minimum loss) path based on a certain method, then the problem becomes a directed graph search problem. Traditional algorithms such as Dijkstra's Algorithm) And the A* algorithm (A* Algorithm) is mainly used to calculate the optimal path search of the discrete graph, and is used to search for the path with the least loss in the road network graph.

behavior planning

Behavior planning is sometimes called decision making (Decision Maker). The main task is to make the next step according to the goal of task planning and the current local situation (position and behavior of other vehicles and pedestrians, current traffic rules, etc.). The decision-making that people and vehicles should implement can be understood as the co-pilot of the vehicle. He instructs the driver whether to follow or overtake according to the target and current traffic conditions, whether to stop and wait for pedestrians to pass or bypass pedestrians, etc.

One method of behavior planning is to use a complex finite state machine (Finite State Machine, FSM) that contains a large number of action phrases. The finite state machine starts from a basic state and will jump to different action states according to different driving scenarios. Pass the action phrase to the lower action planning layer. The following figure is a simple finite state machine:

6d8673e8c595a210baefc458b9d7f605.jpeg

As shown in the above figure, each state is a decision on the vehicle's action, there are certain jump conditions between states, and some states can be self-looped (such as the tracking state and waiting state in the above figure). Although it is currently the mainstream behavior decision-making method used in unmanned vehicles, the finite state machine still has great limitations: first, to realize complex behavior decision-making, a large number of states need to be manually designed; the vehicle may fall into a finite state machine without Considered state; if the finite state machine is not designed with deadlock protection, the vehicle may even get stuck in some kind of deadlock.

action planning

The process of planning a series of actions to achieve a certain purpose (such as avoiding obstacles) is called action planning. Generally speaking, two indicators are usually used to consider the performance of action planning algorithms: Computational Efficiency and Completeness. The so-called computational efficiency refers to the processing efficiency of completing an action planning. The computational efficiency of action planning algorithms is very high. It largely depends on the configuration space (Configuration Space). If an action planning algorithm can return a solution within a limited time if the problem has a solution, and can return no solution if there is no solution, then we call the action The planning algorithm is complete.

Configuration space: a set that defines all possible configurations of the robot, which defines the dimensions that the robot can move, the simplest two-dimensional discrete problem, then the configuration space is [x, y], the configuration space of unmanned vehicles can be very complex , depending on the motion planning algorithm used.

After introducing the concept of configuration space, the action planning of unmanned vehicles becomes: given an initial configuration (Start Configuration), a target configuration (Goal Configuration) and several constraints (Constraint) , find a series of actions in the configuration space to reach the target configuration, and the execution result of these actions is to transfer the unmanned vehicle from the initial configuration to the target configuration while satisfying the constraints.

In the application scenario of unmanned vehicles, the initial configuration is usually the current state of the unmanned vehicle (current position, velocity and angular velocity, etc.), and the target configuration comes from the upper layer of action planning - the behavior planning layer, and the constraints It is the movement limit of the vehicle (maximum corner amplitude, maximum acceleration, etc.).

Obviously, the amount of calculation for action planning in a high-dimensional configuration space is very huge. In order to ensure the integrity of the planning algorithm, we have to search almost all possible paths, which forms the "dimension disaster" in continuous action planning. question. At present, the core idea to solve this problem in action planning is to convert the continuous space model into a discrete model. The specific methods can be classified into two categories: Combinatorial Planning and Sampling-Based Planning.

Combinatorial approaches to motion planning find paths through a continuous configuration space without resorting to approximations. Due to this property, they can be called exact algorithms. The combination method finds a complete solution by establishing a discrete representation of the planning problem. For example, in the Darpa Urban Challenge (Darpa Urban Challenge), the action planning algorithm used by CMU's unmanned vehicle BOSS, they first use the path planner to generate alternatives The paths and target points (these paths and target points are reachable by fusion dynamics), and then the optimal path is selected by the optimization algorithm.

Another discretization method is the grid decomposition method (Grid Decomposition Approaches). After gridding the configuration space, we can usually use a discrete graph search algorithm (such as A*) to find an optimal path.

Sampling-based methods are widely used due to their probabilistic integrity, the most common algorithms such as PRM (Probabilistic Roadmaps), RRT (Rapidly-Exploring Random Tree), FMT (Fast-Marching Trees), in the application of unmanned vehicles, The state sampling method needs to consider the control constraints of the two states, and also needs a method that can effectively query whether the sampling state and the parent state are reachable. Later, we will introduce State-Lattice Planners in detail, a sampling-based motion planning algorithm.

172dc15c8aae047561808f74c27161ba.jpeg

control

The control layer is the bottom layer of the unmanned vehicle system, and its task is to realize our planned actions, so the evaluation index of the control module is the control accuracy. There will be measurements inside the control system, and the controller will control the action by comparing the vehicle's measurements with our expected state output. This process is called feedback control (Feedback Control).

Feedback control is widely used in the field of automation control, and the most typical feedback controller is the PID controller (Proportional-Integral-Derivative Controller). The control principle of the PID controller is based on a simple error signal, which is determined by Three components: the proportion of the error (Proportion), the integral of the error (Integral) and the differential of the error (Derivative).

PID control is still the most widely used controller in the industry because of its simple implementation and stable performance. However, as a pure feedback controller, there are certain problems in the PID controller in the control of unmanned vehicles: the PID controller is purely based on The current error feedback, due to the delay of the braking mechanism, will bring delay to our control itself, and PID does not have a system model inside, so PID cannot model the delay. In order to solve this problem, we introduce a model-based predictive control methods.

  • Prediction model: A model that predicts the state of a period of time in the future based on the current state and control input. In an unmanned vehicle system, it usually refers to the kinematics/dynamics model of the vehicle;

  • Feedback correction: the process of applying feedback correction to the model, so that the predictive control has a strong ability to resist disturbance and overcome system uncertainty.

  • Rolling Optimization: The control sequence is optimized rollingly to obtain the prediction sequence closest to the reference trajectory.

  • Reference track: the set track.

The following figure shows the basic structure of model predictive control. Since model predictive control is optimized based on the motion model, the control delay problem faced in PID control can be taken into account by building a model, so model predictive control has a high value in the control of unmanned vehicles. application value.

5f9ce617830f0bbfd88356f7460e975d.jpeg

20244e7461c6fb2da7cee075fd75b8eb.jpeg

epilogue

In this section we outline the basic structure of unmanned driving systems. Unmanned driving software systems are usually divided into three layers: perception, planning, and control. To some extent, unmanned vehicles can be regarded as a "manned robot" under this layered system, in which perception specifically includes environment perception and positioning. Breakthroughs in deep learning in recent years have made image-based Perception technology and deep learning have played an increasingly important role in environmental perception. With the help of artificial intelligence, we are no longer limited to perceiving obstacles, but gradually become understanding what obstacles are, understanding the scene, and even predicting target obstacles. We will learn more about the behavior of objects, machine learning and deep learning in the next two chapters.

In the actual unmanned vehicle perception, we usually need to fuse multiple measurements such as lidar, camera and millimeter-wave radar. Here, fusion algorithms such as Kalman filtering, extended Kalman filtering and lidar are involved.

There are many positioning methods for unmanned vehicles and robots. The current mainstream method is the fusion of GPS + inertial navigation system, and the second is the method based on Lidar point cloud scanning and matching. ICP, NDT and other algorithms based on point cloud matching will be introduced.

The planning module is also divided into three layers: task planning (also known as path planning), behavior planning and action planning. The task planning method based on road network and discrete path search algorithm will be introduced later. In behavior planning, we will It focuses on the application of finite state machines in behavior decision-making. At the level of action planning algorithms, it focuses on sampling-based planning methods.

For the control module of unmanned vehicles, we often use the control method based on model prediction, but before understanding the model predictive control algorithm, as an understanding of basic feedback control, we have learned about the PID controller. We then look at the two simplest classes of vehicle models—kinematic bicycle models and kinetic bicycle models, and finally, we introduce model predictive control.

Although it is the consensus in the industry to understand unmanned vehicles as robots and use the thinking developed by robots to process unmanned vehicle systems, there are also some cases where artificial intelligence or intelligent agents are simply used to complete unmanned driving. Among them, end-to-end unmanned driving based on deep learning and driving agents based on reinforcement learning are current research hotspots.

Scan and join the free "Smart City Smart Transportation" knowledge planet to learn more industry information and materials.

Welcome to the Intelligent Transportation Technology Group!

Contact information: WeChat ID 18515441838

Guess you like

Origin blog.csdn.net/weixin_55366265/article/details/130453296