Automatic (Intelligent) Driving Series | (1) Introduction and Sensors

Description: This series is the technology stack and ideas of automatic (smart driving) perception fusion (some pictures and information come from the Internet)

This topic is to organize and review what I have learned, and share it with those who want to learn this aspect. This is the first article in this series. It is mainly an introduction, and it will be more algorithmic understanding and implementation later.

This article is divided into 2 parts: introduction to automatic driving and sensors, sensor calibration

Table of contents

1. Introduction to autonomous driving and sensors

1..1 Autonomous Driving Classification and Industry Chain

1.2 Common sensors of perception module

2. Calibration of the sensor

2.1 camera

2.2 Lidar calibration (camera2Lidar)

 2.3 Radar Calibration

1. Introduction to autonomous driving and sensors

1..1 Autonomous Driving Classification and Industry Chain

Autonomous driving is classified into 5 levels according to SAE:

SAE ( Society of Automotive Engineers )

At present, most companies have completed L2-level assisted driving. When the function is turned on, the driver is in the driving state. At present, the highest level is L4-level automatic driving.

The current industrial chain is shown in the figure above. Broadly speaking, the autonomous driving module is divided into perception, decision-making and execution modules.

Here we mainly expand the perception module.

1.2 Common sensors of perception module

The perception module can be seen as the basis for subsequent decision-making and execution, and includes sensors such as cameras, lidar, millimeter-wave radar, GPS, IMU, and so on.

The main task of perception is through hardware sensors. Objects include: road surfaces, static objects and dynamic objects, involving road boundary detection, obstacle detection, vehicle detection, pedestrian detection, traffic signal detection , etc. In fact, it is not enough to complete the detection task. It is also necessary to track and predict the estimation of the moving object and predict its next position . Multi-sensor fusion technology is required. Data can be obtained in the form of images, videos, point clouds, etc.

In fact, sensor fusion is not a new thing. It has appeared as early as the last century. Traditionally, it is based on statistical methods such as Kalman filtering. Now, with the development of neural networks, the development of autonomous driving has been promoted. With massive Data and rich network structure innovation, the data-driven end2end model is developing rapidly.

Compared with traditional vision technology, neural network: 1. It is easier to migrate to new targets, as long as a sufficient number of samples are obtained, the corresponding network can be trained (transfer); 2. The robustness to occluded objects is excellent. Feature extraction ability; 3. Robustness to lighting and other conditions ;

With the development of deep learning, in addition to NLP and vision, network models for irregular and non-European data have emerged, such as Pointcloud. These new technologies make it possible to substitute lidar and millimeter-wave radar into the field of perception to enhance performance.

But at present, the field of autonomous driving perception still faces great challenges. At this stage, many large-scale models can achieve good results but cannot process data in real time, and the road conditions in different countries and regions are different, even in different regions of the country. The road conditions are complex, and the ability to process video tasks is often not comparable to that of a single-frame image.

Intelligent driving car information acquisition is generally the fusion information of in-car sensors and V2X vehicle-road coordination through 5G

and other communication methods to realize information interaction and sharing, and based on this information, complete the overall control of the vehicle. Here we only discuss the perception of sensor fusion.

First of all, we need to be clear that sensor perception is a product of a combination of software and hardware, including not only the selection of hardware, but also the blessing of software algorithms. As mentioned earlier, hardware includes cameras, millimeter-wave radars, ultrasonic radars, infrared detectors, IMUs, GPS, lidar, and more. Multi-sensor fusion is to integrate information from different scales, complement each other, learn from each other, and improve the stability and fault tolerance of the system.

Let’s start with vision. Computer vision has achieved amazing development. Just like us, vision is also the main way of perception for unmanned vehicles. It plays an important role in traffic signal lights and traffic sign recognition. Musk once said at the development conference: Their purely visual solution may see the "stop" symbol on the T-shirt and respond. Perhaps this is why Tesla, with its "pure vision" program, has also begun to enter the radar world. By processing the collected images, tasks such as classification, segmentation, tracking, and classification of traffic participants are realized. It has a strong ability to extract semantic information, but its performance degrades in some cases where the light is too weak or too strong, and the line of sight is blocked. It is more serious, and the most important visible light camera cannot work around the clock, so some vehicles are now equipped with passive infrared sensors to classify traffic participants, which has also achieved certain results. The more famous one is the well-known infrared manufacturer FLIR, whose BOSON can even reach 60hz, and they also released the infrared data set (but it has been removed from the shelves now). For cameras, fisheye cameras and pinhole cameras are commonly used. Of course, for stereo vision, as Microsoft's Kinect enters the homes of ordinary people, binocular and even trinocular are also appearing in today's products.

Next, let’s talk about lidar. There are many classifications of lidar, according to the band, according to the structure and working method, and so on. ToF is the principle of mainstream lidar. 905nm is the most common laser radar band, and the price is much cheaper than 1550nm, all of which belong to the infrared band. In terms of structure, the most common one is the rotating mechanical type. Its most notable feature is the 360-degree FOV. It is the most mature type of lidar with high precision. The disadvantage is that the mechanical rotating structure may need to be considered for long-term use. The most important problem is its price, but with the addition of domestic manufacturers and technological development, the price has dropped a lot. According to industry insiders, "the development of LiDAR is also compounded by Moore's Law". The other is similar to Sagitar's M1 LiDAR (semi-solid), whose principle is MEMS Mirror, which scans through the resonance of the micro-motor galvanometer. Horizontal field of view is 120 degrees, schematic diagram of MEMS lidar:

LiDAR currently equipped with intelligent driving vehicles:

The biggest advantage of lidar is its high resolution, which is manifested in distance, speed, and angular resolution. Dense point clouds can realize the recognition of people, vehicles, trees, buildings, etc. However, the viewing distance will be greatly affected in rainy and snowy weather.

Millimeter wave radar is a very common radar, and its good speed resolution can measure the speed and angle of the target to complete the safety warning. The vehicle-mounted radars used in the field are all FMCW, and the measurement of distance and angle is completed according to the difference frequency signal sent and received. At the same time, Doppler velocity is measured according to the relationship between different chirps. Compared with LiDAR, it has strong penetrability, and it has an obvious response to metal materials. The wide frequency band brings a longer ranging range, but generally more clutter needs to be processed. Traditional radar only has plane information and speed information. In the absence of height information, the emergence of 4D radar makes up for this problem, and can generate denser point clouds, which is a non-productive and promising development direction today.

Ultrasonic radar is often used for obstacle avoidance, with a detection range of 1-5m, an accuracy of 1-3cm, strong penetrability, simple structure, and low price. It is often installed on the front and rear bumpers and sides of the car. Its disadvantages are temperature sensitivity and poor directionality. It plays a role in the automatic parking and reverse assist of automatic driving.

IMU, integrated inertial navigation, the core components are gyroscope and accelerometer, establish coordinates according to the position of the gyroscope, and output the position according to the accelerometer. The combination of GPS and IMU can continuously revise the long-term displacement drift of IMU, and convert the IMU coordinate system into the current accurate coordinate system in GPS. Constantly updated current position and velocity. RTK service makes GNSS more accurate, but a set of IMU+GPS plus RTK service is very expensive. The inertial navigation system is the core of the fusion of the entire positioning module.

Here is the summary:

2. Calibration of the sensor

The above introduces the main sensors of autonomous driving, and each sensor has its own coordinate system, so in order to have a unified mapping relationship for objects in space, we need to calibrate and unify the designed sensor structure to a unified space-time coordinate system. This is particularly important, and seriously affects the accuracy. Due to car vibration and other reasons, we need to re-calibrate the structure at intervals. For fixed structures, we treat it as a 3D rigid body transformation.

2.1 camera

Camera imaging can be simplified as a "small hole imaging" model, for which there are four sets of coordinate relationships: the world coordinate system (in m), the camera coordinate system (in mm or m), the image physical coordinate system ( That is, the physical imaging plane, the center of the film in the figure, the unit is generally cm) pixel coordinate system (in pixel)

For specific derivation, please refer to: https://zhuanlan.zhihu.com/p/476032066

When calculating the internal parameters, the relationship between the pixel plane and the camera coordinate point is given, and written as a homogeneous formula 

fx, fy are scaled equivalent focal lengths. cx and cy are the translation amount of the object plane, and the unit is pixel, forming an internal reference matrix.

The internal reference information is generally stored in .ini or .yaml. The most famous calibration method is Zhang Zhengyou’s calibration method. The principle is pnp. The calibration toolbox includes the autoware toolbox in ROS and MATLAB, etc.

We can obtain the extrinsic parameters by transforming the steel body on point P, that is, the extrinsic parameters are the Rt transformation between the camera coordinate system and the world coordinate system.

Cameras generally have more or less distortion, which can be divided into pincushion distortion and barrel distortion. Generally, it is caused by the lens or the installation, so we need to remove the distortion. The distortion is divided into radial and tangential distortion. The so-called radial is caused by the lens, the barrel shape means that the magnification decreases with the increase of the distance from the optical center , and the pincushion is the opposite; the tangential distortion is caused by the installation error, and the lens and the imaging plane are not parallel.

 For radial:

For tangential:

 Sometimes only two parameters k1 and k2 are taken in the radial direction, and p1 and p2 are taken in the tangential direction to form five-parameter or four-parameter book distortion correction parameters.

So whether we should consider the spatial relationship first or correct it first, it doesn’t matter, but for the convenience of processing, we usually correct the distortion first, and then directly follow the corresponding relationship without considering the distortion.

Among them: r is the polar diameter represented by the normalized plane polar coordinates equal to x square = y square under the root sign

 Pixel plane corresponds to:

 Note: When there are six parameters, the coefficients in the x and y directions are not the same, only the quadratic term is taken, there are k1~k4 plus p1 and p2, a total of six parameters.

2.2 Lidar calibration (camera2Lidar)

Generally, the internal reference of the laser radar is completed by the manufacturer when it leaves the factory, which represents the relationship between the coordinates of the laser transmitter and the own coordinate system defined by the sensor.

So we need to calibrate the external parameters of the lidar to the camera. There are generally two methods: target-based and target-less, that is, to calibrate with physical objects such as checkerboards or coincide with the target scene (such as determining a tree, etc.) to calibrate. This method is often used together with the IMU.

Through the calibration board, extract the key points, and take more than 4 correspondences to take you to use the least squares solution.

At present, there are still calibrations through the RANSAC method, which are all offline methods, and there are still many online calibrations based on feature expression.

The tool can still use MATLAB and Autoware, and of course it can reproduce the method in the paper.

 2.3 Radar Calibration

Radar calibration is a more difficult part, because the point cloud quality of radar is generally worse than that of lidar, and the density is small, but the core is still to solve the corresponding relationship of space. Structural design is the key. Generally, using lidar and radar as benchmarks is better than camera and radar for calibration.

 In addition, there are emerging deep learning joint calibration methods such as CalibNet.

This piece of knowledge mainly involves pnp, feature points, point cloud RANSAC, etc.

Guess you like

Origin blog.csdn.net/m0_46611008/article/details/125683925
Recommended