Joint calibration of camera and lidar (1)

In the multi-sensor information fusion technology, camera and 3D lidar have been developed rapidly and widely used under the premise that their characteristics are highly complementary. In many camera and 3D lidar-based applications, the joint calibration of camera intrinsic parameters and camera and 3D lidar extrinsic parameters is an important basis for later detection, tracking, and SLAM technologies. As algorithms are updated and iterated, calibration accuracy, speed, and range have improved, but a systematic and comprehensive survey of camera and 3D lidar calibration is still lacking.

This paper firstly introduces the performance comparison and application status of the two sensors; secondly, introduces the calibration process, the selection of calibration targets and the establishment of the calibration model; then according to the classification, introduces the principle and algorithm of camera internal parameter calibration and joint external parameter calibration; finally, summarizes and gives the future development of camera and 3D lidar calibration!

Ground unmanned platforms represented by unmanned vehicles and robots have broad application prospects in reconnaissance, safety inspection and explosion-proof tasks due to their autonomy, flexibility and small size. In recent years, there have been many studies on unmanned platform environment perception and scene understanding. Among many key technologies, obtaining target attitude information is the key prerequisite for unmanned platforms to achieve target perception and environmental scene understanding, as shown in Figure 1.

At this stage, most unmanned platforms use the "lidar + vision camera" solution to achieve complete environmental perception. Among them, lidar obtains high-precision, discrete 3D point cloud data on the target surface, which is less affected by the environment and has good robustness; visual cameras can obtain high-resolution, high-quality 2D visual information, which has great advantages in environmental perception and target detection.

Since the information of the camera and lidar is highly complementary, the fusion of the two can not only overcome the shortcomings of a single sensor in environmental perception, but also obtain more abundant target observation data and improve the accuracy of its environmental perception, as shown in Figure 2 and Table I.

Due to the sparse data points of the lidar and the high resolution of the visual camera, the joint calibration of the lidar and the visual camera is the basic link to realize the fusion of the two. In response to the joint calibration of lidar and vision cameras, various calibration methods have been proposed. However, most of these methods are based on actual needs, so there is no systematic solution, and there is a lack of open source calibration data sets, and various methods are not uniform, so it is difficult to provide researchers with a clear reference.

On this basis, this paper summarizes the advanced calibration methods of lidar and single camera joint calibration in recent years. This paper will focus on the basic issues of joint calibration system construction, calibration board design, and calibration parameter solution, as shown in Figure 3 and Figure 4.

Construction of Calibration System

A typical calibration system can be divided into the following steps: first, select the laser radar, vision camera and calibration board according to the requirements, then establish the conversion relationship of the calibration target in different coordinate systems according to a certain method, and finally solve the conversion matrix. Among them, the selection of the calibration plate, the establishment of the coordinate system conversion relationship, and the solution of the calibration equation are the key links.

Figure 5 shows the four main coordinate systems involved in the joint calibration of lidar and camera based on the calibration board! Pixel coordinate system (u, v), camera coordinate system, image plane coordinate system (x, y), lidar coordinate system.

The joint calibration of camera and lidar usually includes two parts: internal parameter calibration and external parameter calibration. The internal parameter calibration mainly solves the distortion of the camera CCD sensor itself and the influence of installation; the external parameter calibration is mainly used to establish the coordinate conversion equation between the two sensors. As shown in Figure 6, this is the flow chart of the joint calibration of the camera and lidar!

The selection of the calibration board is different according to the calibration object and the extracted feature points. Calibration can be carried out in two ways: target-based calibration or target-free calibration. Target-based calibration usually uses a specific calibration plate to facilitate the algorithm to extract specific feature points, while the non-target calibration method usually directly uses the material or structural features in the environment to extract and match to achieve calibration, in order to increase the scope and convenience of calibration, as shown in Figure 7.

1)Target-based Calibration

The characteristic of target-based calibration is to use a specific type of calibration plate in the calibration process. In this way, the feature points are easy to capture, the required algorithm is simple, and the computing power is small. But the disadvantages are also obvious: First, the sensor must be calibrated before it is used, and real-time calibration cannot be performed. The other is that a two-dimensional planar panel (such as a chessboard) needs to obtain a clear correspondence, which not only makes the final calibration accuracy heavily dependent on 3D and 2D feature points, but also requires manual intervention!

2D calibration board

A 2D calibration board usually depends on a specific calibration objective, most commonly a checkerboard. Zhang [1] first proposed a checkerboard calibration board to estimate the parameters between 2D lidar and camera through a checkerboard with multiple poses. Ranjith Unnikrishnan et al. [2] proposed a calibration method for 3D lidar and camera external parameters based on the camera and 2D lidar calibration method proposed by Zhang, as shown in Figure 8.

However, due to the large distance between longitudinal beams of LiDAR and the low acquisition resolution, it is difficult to guarantee the accuracy of precise vertices through edge extraction. Lyu [3] manually adjusted the position of the chessboard in the laser point cloud so that the scanning line can scan to the vertices of the chessboard, but this will increase the time and complexity of the calibration process.

Also in order to solve the problem of inaccurate point cloud edge fitting, Kang Guohua et al. [4] used the coarse registration of the point cloud center to achieve the overall fine registration of the point cloud. From another point of view, in addition to outputting the coordinate information of the 3D point cloud, lidar can also bring back reflection intensity information. [5] set the threshold value for the lidar reflection intensity of different materials to obtain the lidar point cloud, and based on this, a new calibration board is designed.

The ArUco mark is a special coding scheme that facilitates the detection and correction of errors on the label itself. Hall et al. present a calibration plate for experimental setups consisting of multiple units. Method [6] uses ArUco markers to calculate the corner points of the calibration plate in the camera coordinate system, and the lidar points to extract the fitted edge lines, and then the external parameters of the lidar camera can be calculated.

In practice, this approach is the same as lidar in a checkerboard, including unstable edge extraction and line fitting algorithms, which will introduce large errors in calibration. Furthermore, the method is more complex due to the need for multiple calibration plates. In order to increase the extractable features of the lidar point cloud, a hollow calibration plate appears. The works of Dong et al. [7] and Zhung et al. [8] are somewhat similar, they constructed a hollow circular hole in the center of the black and white chessboard, and calibrated the center of the circular hole as a feature point. However, since the lidar point cloud has no obvious features, the coordinates of the center can only be approximated from the whole circular hole point cloud, as shown in Fig. 9.

Recently, Huang Qiang et al. [9] proposed a joint calibration method for lidar and vision sensors based on a reconfigurable calibration board, which uses a barcode-like method to automatically identify the feature points of lidar, and adds a camera verification mechanism to alleviate errors caused by camera recognition instability during the calibration process.

3D Calibration Plate

The purpose of the 3D calibration board is to better help the lidar to find the feature points, so as to better match the feature points. When the background is impure, the hollow plate mentioned in the previous section may cause pixel mixing. Zhou Shihui et al. [10] analyzed the mixed pixel error produced by the planar hollow calibration plate, designed a special grid-shaped calibration plate, and realized the high-precision joint calibration of industrial cameras and lidar based on the feature point matching method to achieve pixel-level image fusion effect. Cai Huaiyu et al. [11] designed a calibration board (BWDC) with gradient depth information, plane angle information and position information.

This method not only extracts features from 1D, but also effectively utilizes the ability of lidar to extract 3D information. However, this puts high demands on the design and manufacturing precision of the calibration plate, which increases the cost and error. As shown in Figure 10.

Pusztai et al. [12] used a cuboid with three vertical sides as the calibration object, estimated the 3D vertices of the cuboid by extracting the edges of the cuboid, and matched it with the 2D vertices extracted from the image. This method is general because it can be used for ordinary boxes. Similarly, Xiaojin Gong et al. [13] proposed extrinsic calibration of 3D lidar cameras based on arbitrary trihedra. Since the trihedral target used for calibration can be orthogonal or non-orthogonal, which often occurs in structured environments, it has a wide range of applications.

2)Targetless Calibration

Targetless calibration does not require manual calibration of the target, and online calibration is more convenient, but the accuracy is low and the robustness is poor. For example, when an online calibration is required, the relative position of the fixed sensor will change due to mechanical vibration, and its calibration parameters will become inaccurate over time. Since most fusion methods are extremely sensitive to calibration errors, their performance and reliability are severely compromised. In addition, most of the calibration process needs to start from scratch, so it is cumbersome and impractical to continuously update the calibration parameters manually. In order to get rid of the limitation of the calibration board and realize online calibration through natural scenes, researchers have conducted a lot of research!

Some of these methods use the correlation between RGB texture and lidar reflectance, and some extract edge [17] or line features [18] in image and laser point cloud for correlation measurement. [19] used the natural alignment of depth and intensity edges combined with a Gaussian mixture model for calibration, and obtained an automatic, target-free, fully data-driven global matching optimization method.

These methods require relatively precise initial parameters, otherwise it is easy to get stuck in local extrema, so target-free calibration methods for self-recognition scenarios are usually used to fine-tune the extrinsic parameters. There are also odometry-based trajectory registration [20], registration of dense point clouds of time-series visual frames and laser point clouds [21], and even deep learning-based methods [22], [23]. These methods are not only highly dependent on the environment, but also affected by the accuracy of vision or laser odometry. The current technology has low accuracy and is not universal, and further research and development are needed.

3) Calibration equation establishment

In the process of camera sensor acquisition, in order to determine the relationship between the 3D geometric position of a point on the surface of a space object and its corresponding point in the image, a geometric model of camera imaging must be established.

(1) Three-dimensional coordinate conversion: the conversion from the camera coordinate system to the lidar coordinate system, the formula is as follows:

(2) Two-dimensional coordinate transformation: transformation from pixel coordinate system to image coordinate system, the formula is as follows:

(3) Coordinate transformation based on the principle of pinhole imaging: the transformation from the camera coordinate system to the image coordinate system, the formula is as follows:

(4) Combined conversion: the conversion from the lidar coordinate system to the pixel coordinate system is combined and derived as shown in Figure 11, and the formula is as follows:

Parameter Solving

1) Camera internal reference

To simplify the calculation, a linear ideal pinhole camera is assumed, as shown in a in Fig. 12. However, due to the deviation of camera lens precision and assembly process, camera imaging inevitably introduces distortion. The distortions of the real picture and two common camera pictures are shown in Fig. 13, resulting in distorted images, as shown in b in Fig. 12.

Internal parameters are one of the important calibration parameters of vision measurement, which reflect the precise correspondence between spatial points and imaging points in computer image coordinates, as shown in Figure 14.

Traditional camera calibration methods. Traditional camera calibration methods use a calibration target or a 3D calibration field with a known structure and high precision as a spatial reference object. After the world coordinate system is established, the spatial coordinates of each feature point are obtained, and the constraint conditions of the internal parameters of the camera are established according to the corresponding relationship between the spatial point and the image point.

Finally, internal parameters are obtained through an optimization algorithm. In the calculation of camera internal parameters, the most widely used method is Zhang Zhengyou’s method [24]. Inspired by Zhang Zhengyou's method, many open source library functions such as OpenCV, ROS, matlab and some toolboxes [25][26] use this method for development and calibration!

Camera self-calibration method. The camera self-calibration method refers to the process of establishing correspondences using a set of images containing overlapping scenes captured by the camera, and completes geometric calibration without relying on calibration reference objects. Since only its own constraints are used, it is independent of the relative motion state between the camera and the surrounding environment, allowing for greater flexibility in cases where a calibration target cannot be used due to harsh conditions.

Camera self-calibration methods [27], [28], [29] for solving the Kruppa equations using properties of absolute conic and epipolar transformations. This method of directly solving the Kruppa equation has too many optimization parameters in the solution process, and it is easy to fall into local optimum. When the image noise is large, the calibration accuracy will be reduced and the robustness will be poor, which is replaced by a hierarchical step-by-step calibration method.

It uses projection calibration as a basis, selects a picture as a standard, and performs projection alignment, reducing unknowns. Then a non-linear optimization algorithm is used to solve all unknown problems [30]. Another branch is the camera self-calibration technology based on active vision. The active vision system refers to the precise installation of the camera on the controllable platform. Actively control the camera to complete the photo shooting according to the specified motion trajectory, and finally use the camera motion parameters to determine the internal parameters [31], [32]. Its disadvantages are higher requirements for experimental equipment, greater restrictions on model parameters, and poor anti-noise ability.

Method based on vanishing point calibration. Since the 1990s, a large number of scholars at home and abroad have begun to study methods based on vanishing point calibration. Geometrically, the vanishing point of a world line is obtained by the intersection of rays parallel to the line and passing through the camera center and the image plane, thus, the vanishing point depends only on the line's orientation, not on its specific location [33], [34], [35]. The vanishing point calibration method does not require object control points, but uses the constraint relationship between the camera's own parameters to build a model. This greatly improves the flexibility of calibration, realizes real-time online calibration, and has a wide range of applications.

Its disadvantages are poor robustness and low precision, because in most methods based on vanishing point calibration, the azimuth element in the camera is calibrated by the vanishing point of a single image, while traditional vanishing point calibration algorithms require three sets of mutually orthogonal parallel lines in the scene. However, it is often limited in practical applications due to factors such as noise. The internal parameters and distortion parameters of the camera are inherent parameters of the camera, which are usually fixed after leaving the factory. As a result, many manufacturers now provide the camera's internals directly from the factory. At the same time, as the internal optimization of the camera design, its distortion control is getting better and better. This is also the reason for the slow research and update of the internal parameter calibration algorithm!

2) External parameter solution

From the perspective of data selection methods, it can be divided into manual calibration solutions and automatic calibration solutions. Manual calibration is the first calibration method developed and used in the early stage, and it is loved by calibration personnel for its simple and effective operation and high precision. However, with the development of science and technology, the amount of data in images and point clouds is huge, resulting in a sharp increase in labor costs. A single manual calibration can no longer meet people's needs for speed, automation and robustness, and automatic calibration has emerged as the times require. As shown in Figure 15, the external parameters of the camera in lidar are calibrated according to different principles!

Manual external parameter calibration

For camera lidar external parameter calibration, the calibration object can provide accurate geometric size information, and manual operation can provide accurate matching information. The most common approach is to use a chessboard to determine a series of 3D point pairs for calibration. For example, Dhall et al. [36] manually selected 3D points and used the least squares method to solve the problem, and Scaramuzza et al. [37] proposed a fast calibration method that does not depend on the calibration object but is based on point features.

The authors manually select a series of corresponding points with discontinuous depths from camera images and lidar point clouds to solve the problem. There are many calibration methods for manual camera-lidar. Different methods achieve high precision in different specific application scenarios. For systems that require real-time online calibration, the main problem of manual calibration is excessive reliance on manual operations and special calibration objects, which seriously reduces autonomy and intelligence.

Automatic External Parameter Calibration

With the urgent needs of intelligent applications, a large number of automatic calibration methods for the external parameters of camera lidar have been born in the past ten years. These methods are classified into feature matching based methods, mutual information based methods, motion based methods and deep learning based methods.

a) Calibration method based on feature matching

The method based on feature matching obtains the coordinates of the feature points in the above two systems by selecting feature points and based on the calibration board. According to the feature matching, the conversion relationship between the lidar coordinate system and the camera pixel coordinate system is directly obtained, and then the conversion matrix and internal and external parameters are calculated by solving the calibration matrix transformation equation or using supervised learning methods. Currently, the feature matching methods based on lidar point cloud edge extraction are mainly divided into two types: indirect method and direct method.

Indirect methods: Most methods convert lidar points into images and then perform edge extraction. Wang et al. [39] used the Canny algorithm to extract the edge of the image, and used a boundary detector to generate a range image from the 3D lidar point cloud, and established the correspondence between the point cloud data and the range image. Merge 2D images and 3D point cloud data by pixel correspondence to obtain edge images with 3D information. Usually, the resolution of the camera and the point cloud are different, and this method may introduce some errors.

Direct method: In the image, the edge of the image is formed where the gray value changes suddenly, and this change is called the image gradient. Based on the features of image edges, Xia et al. [40] proposed the concept of 3D point cloud gradient, which was then based on LiDAR point cloud for fast edge extraction. This fitting method can extract more edge features, has higher precision, and is not sensitive to the density of points.

b) Calibration method based on mutual information

The main idea of ​​mutual information based methods is to find correlated variables in camera images and lidar point clouds. Calibration extrinsic parameters are optimized by computing the mutual information between the grayscale values ​​of the camera image and related variables (such as the reflectivity of the lidar or the angle of the lidar reflection), and then maximizing the mutual information. Pandney et al. [41] believed that there is a strong correlation between the reflectivity of the lidar point cloud and the gray value of the camera image, and the probability of reflectivity and gray value is calculated by the kernel density estimation method.

Finally, the mutual information is calculated by the Barzilai-Borwein steepest descent method and the external parameters are optimized. Taylor et al. [42] argue that the angle between the normal of the 3D point corresponding to the camera image point and the camera will affect the reflection intensity, and the calibration is performed by computing the mutual information between the reflectivity of the lidar point cloud and the above angle corresponding to the camera image point. It can be seen that the precise extrinsic parameters between two sensors are especially important when two sensors perform tight data fusion.

Due to the sparsity of the LiDAR point cloud, it is impossible to directly extract the LiDAR points scattered at the corners of the calibration plate, which will lead to larger errors and will also lead to accurate corner calculations. Therefore, the LiDAR camera external calibration method based on directly extracted reflection intensity feature points can effectively compensate for this problem. However, the effectiveness of mutual information-based methods is related to the material and reflectivity of the target object in the environment where the sensor is located, and is easily disturbed by factors such as lighting conditions and weather changes.

c) Motion-based calibration method

Motion-based methods usually solve extrinsic parameters from continuous motion data or multi-view data. First, the motion of the camera and lidar are calculated separately, and then the direct extrinsic parameters of the camera and lidar are resolved by hand-eye calibration. Taylor [43] et al. proposed a set of calibration methods that can be applied to any system consisting of lidar, camera and navigation sensors. The authors first register consecutive frames from each sensor and then use the concatenation to calibrate extrinsic parameters. Ishikawa [44] et al. used the KLT tracker to track the projection point of the lidar on the image plane, and continuously optimized the translation vector lacking 3D scale through the projection error, and achieved more accurate results than Taylor. Zhao [45] used the front and back frame images of the camera to reconstruct the 3D of the urban scene and calculate the pose change, and calculated the ICP of the front and back frames of the lidar point cloud to obtain the spatial change of the point cloud.

At the same time, there are many studies that have achieved image registration on UAV and LiDAR point clouds through techniques such as keyframes and motion recovery structures. For example, Nedevschi et al. [46] used feature matching to detect sensor drift during vehicle travel. The method finds edges in an image by performing a distance transform on the image, and at the same time, converts the lidar point cloud into a range image. The objective function is established according to the edge information. This method can eliminate the drift generated during the vehicle movement and automatically adjust the external parameters.

It can be seen that motion-based methods are suitable for solving initial values ​​of calibration parameters and large-scale scenes. But the main disadvantage of the motion-based method is that the motion of the monocular camera lacks 3D scale information, and there is a large error between the pose estimation and the data itself, and the accuracy needs a breakthrough. Since the scale problem can be recovered by stereo vision, the calibration problem is greatly simplified, so this method is more suitable for stereo cameras.

d) Calibration method based on deep learning

With the rapid development of deep learning in recent years. Various perception tasks in autonomous driving can be achieved and performed well through deep learning, and the external parameter calibration of lidar and camera can also be predicted through neural networks. Schneider [47] et al. proposed the first method to apply deep learning to the calibration problem, extracting camera images and LiDAR point clouds respectively through the RegNet neural network, and then performing regression.

Ganesh Iyer [48] and others designed a geometrically supervised deep network that can automatically estimate the 6-DoF rigid body transformation between 3D LiDAR and 2D camera in real time. Among them, CalibNet reduces the need for calibration targets, which greatly saves calibration work. Recently, Kaiwen Yuan et al. [49] proposed a LiDAR camera calibration method based on RGGNet. The method considers Riemannian geometry and learns an implicit tolerance model using a deep generative model. This method not only considers the calibration error, but also considers the tolerance within the error range, and achieves a good calibration effect. For deep neural networks, there is no need to pre-extract the features of images and lidar point clouds to establish the mapping and connection of the two data.

Instead, it goes directly to the nodes in the neural network to find potential relationships. For calibration problems, supervised learning is clearly not enough. It is difficult to directly obtain the true value of extrinsic parameters, and it is difficult to provide a reliable and sizable training set. Therefore, unsupervised learning or semi-supervised learning is more suitable for external parameter calibration problems. However, the existing algorithms have higher requirements on the conditions of use, a large amount of training is a huge amount of calculation, and the generalization ability needs to be improved urgently.

reference

[1] Review of a 3D lidar combined with single vision calibration.

Guess you like

Origin blog.csdn.net/scott198510/article/details/131158743