Fundamentals of Camera Calibration

This article will briefly introduce the theoretical basis of camera calibration, laying a solid foundation for subsequent practice.

1. Camera internal parameter calibration

1.1 Ideal camera imaging principle

In an ideal situation, the camera imaging model is usually represented by a linear pinhole imaging model. According to the four coordinate systems and their relationships introduced in the previous article, it is assumed that the Pworld coordinates of a three-dimensional point in a certain space are , and the pixel coordinates of (X_w,Y_w,Z_w,1)^Tthe corresponding projected points are . A rotation matrix represents rotational motion, and a translation matrix represents translational motion. The projection matrix represents the projective transformation between the three-dimensional information in space and the two-dimensional information of the image. The relationship between the world coordinates of a 3D point and the pixel coordinates of its corresponding projection point under the ideal camera imaging model is:p(u,v,1)^TRtMPp

s\left [ \begin{matrix} u\\ v\\ 1 \end{matrix} \right ]=\left [ \begin{matrix} \frac{1}{dX} & 0 & u_0\\ 0 & \frac{1}{dY} & v_0\\ 0 & 0 & 1 \end{matrix} \right ]\left [ \begin{matrix} f & 0 & 0 & 0\\ 0 & f & 0 & 0\\ 0 & 0 & 1 & 0 \end{matrix} \right ]\left [ \begin{matrix} R & t\\ 0 & 1 \end{matrix} \right ]\left [ \begin{matrix} X_w\\ Y_w\\ Z_w\\ 1 \end{matrix} \right ]=\left [ \begin{matrix} f_x & 0 & u_0 & 0\\ 0 & f_y & v_0 & 0\\ 0 & 0 & 1 & 1 \end{matrix} \right ]\left [ \begin{matrix} R & t\\ 0 & 1 \end{matrix} \right ]\left [ \begin{matrix} X_w\\ Y_w\\ Z_w\\ 1 \end{matrix} \right ]=KT\left [ \begin{matrix} X_w\\ Y_w\\ Z_w\\ 1 \end{matrix} \right ]=M\left [ \begin{matrix} X_w\\ Y_w\\ Z_w\\ 1 \end{matrix} \right ]

Among them, f_xand f_yis the normalized focal length of the camera, Kis the internal parameter matrix of the camera, and Tis the external parameter matrix of the camera.

1.2 Camera nonlinear model and distortion correction

In the actual camera imaging process, due to the processing error of the camera lens or the existence of camera assembly deviation, there is distortion between the actual camera imaging model and the ideal linear model of the camera. Therefore, the distortion model is introduced on the basis of the ideal imaging model of the camera to form the A nonlinear model of camera imaging. Although the camera nonlinear model is more complex than the ideal linear model of the camera, the camera nonlinear model is closer to the real imaging model of the camera. Camera distortion simply means that a straight line in the actual three-dimensional space cannot be kept as a straight line when projected onto the image. As shown in Figure 1, there is a deviation between the actual projection point and the ideal projection point of the three-dimensional point in the space on the image plane. Generally, camera distortion is divided into radial distortion and tangential distortion.

Figure 1. Schematic diagram of camera distortion

1. Radial Distortion

Figure 2. Schematic diagram of radial distortion

 

As shown in Figure 2, radial distortion manifests itself as actual straight lines being more curved at the edges of the image than near the center of the image. By default, the radial distortion of the optical center of the camera lens is 0, and the closer to the edge of the image, the more serious the radial distortion. The radial distortion can be corrected by the following Taylor series expansion:

x_{c,r}=x(1+k_1+r^2+k_2r^4+k_3r^6)

y_{c,r}=y(1+k_1r^2+k_2r^4+k_3r^6)

Among them, (x,y)is the image plane coordinates of a certain point on the image plane with distortion (that is, the original image plane coordinates), (x_{c,r},y_{c,r})is the coordinates of the point obtained by distortion correction to remove radial distortion, and ris the distance from the point to the imaging center.

2. Tangential Distortion

Tangential distortion is caused by the fact that the camera lens cannot be strictly parallel to the imaging plane during camera assembly. Tangential distortion can be corrected by the following equation:

x_{c,t}=x+[2p_1xy+p_2(r^2+2x^2)]

y_{c,t}=y+[2p_2xy+p_1(r^2+2y^2)]

Among them, (x_{c,t},y_{c,t})is the coordinate obtained by tangential distortion correction to remove the tangential distortion.

According to the above introduction, for the image plane coordinates of the Pprojected point corresponding to a three-dimensional point in a certain space , the coordinates after radial distortion and tangential distortion correction are: pixel coordinates on the image planep(x,y)(x_{corrected},y_{corrected})(u,v)

x_{corrected}=x(1+k_1r^2+k_2r^4+k_3r^6)+[2p_1xy+p_2(r^2+2x^2)]

y_{corrected}=y(1+k_1r^2+k_2r^4+k_3r^6)+[2p_2xy+p_1(r^2+2y^2)]

u=f_xx_{corrected}+u_0

v=f_yy_{corrected}+v_0

Among them, five distortion coefficients can form a distortion coefficient matrix D=(k_1,k_2,p_1,p_2,k_3).

1.3 Camera calibration method based on checkerboard calibration board

Existing camera calibration methods mainly use projection geometry to correlate spatial three-dimensional information with two-dimensional information in images captured by the camera to calculate camera internal and external parameters. Zhang Zhengyou's calibration method is a classic camera calibration method based on a checkerboard calibration board. According to the introduction in the previous section, the camera imaging model can be simplified as:

Z_c \left[ \begin {matrix} u\\ v\\ 1 \end {matrix} \right] = M\left [ \begin{matrix} X_w\\ Y_w\\ Z_w\\ 1 \end{matrix} \right ]

Assuming that the origin of the world coordinates is established on the checkerboard calibration board, and the plane where the checkerboard calibration board is located is set as the plane of the world coordinate system , then the world coordinates of a feature point on X_wO_wY_wthe checkerboard calibration board can be expressed as (X_w,Y_w,0,1)Assuming that the pixel coordinates of the projection point corresponding to this point are (u,v,1), then the relationship between the two is

Z_c\left [ \begin{matrix} u\\ v\\ 1 \end{matrix} \right ]=H\left [ \begin{matrix} X_w\\ Y_w\\ 1 \end{matrix} \right ]

Suppose H=\left [ \begin{matrix} \vec{h_1} & \vec{h_2} & \vec{h_3} \end{matrix} \right ], combining the above two formulas, we can get:

\left [ \begin{matrix} \vec{h_1} & \vec{h_2} & \vec{h_3} \end{matrix} \right ]=\lambda K\left [ \begin{matrix} \vec{r_1} & \vec{r_2} & t \end{matrix} \right ]

where \lambdais a nonzero constant and \vec{r_1},\vec{r_2},\vec{r_3}is the column vector of the rotation matrix.

Since the rotation matrix Rhas two properties:

(1) RThe column vectors of the rotation matrix are orthogonal to each other, that is\vec{r_1}\vec{r_2}=0

(2) RThe moduli of the column vectors of the rotation matrix are equal and the modulus is 1, that is \left | \vec{r_1} \right |=\left | \vec{r_2} \right |=1, the following relationship exists:

\vec{h_1}^{T}K^{-T}K^{-1}\vec{h_2}=0

\vec{h_1}^{T}K^{-T}K^{-1}\vec{h_1}=\vec{h_2}^{T}K^{-T}K^{-1}\vec {h_2}

supposeB=K^{-T}K^{-1}=\left [ \begin{matrix} b_{11} & b_{12} & b_{13}\\ b_{21} & b_{22} & b_{23}\\ b_{31} & b_{32} & b_{33} \end{matrix} \right ]

At this time, the distortion factor is considered \gamma, so the camera intrinsic parameters can be expressed as:

K=\left [ \begin{matrix} f_u & \gamma & u_0 \\ 0 & f_v & v_0\\ 0 & 0 & 1 \end{matrix} \right ]

will Kbring in the Bavailable:

B=K^{-T}K^{-1}=\left [ \begin{matrix} \frac{1}{f_{u}^{2}} & -\frac{\gamma }{f_{u}^{2}f_v} & \frac{\gamma v_0-u_0f_v}{f_{u}^{2}f_v}\\ -\frac{\gamma }{f_{u}^{2}f_v} & \frac{\gamma }{f_{u}^{2}f_v}+\frac{1}{f_{v}^{2}} & -\frac{\gamma (\gamma v_0-u_0f_v)}{f_{u}^{2}f_{v}^{2}}-\frac{v_0}{f_{v}^{2}}\\ \frac{\gamma v_0-u_0f_v}{f_{u}^{2}f_v} & -\frac{\gamma (\gamma v_0-u_0f_v)}{f_{u}^{2}f_{v}^{2}}-\frac{v_0}{f_{v}^{2}} & \frac{(\gamma v_0-u_0f_v)^2}{f_{u}^{2}f_{v}^{2}}+\frac{v_{0}^{2}}{f_{v}^{2}}+1 \end{matrix} \right ]

It can be seen from the above formula Bthat it is a symmetric matrix, so Bthere are only 6 parameters, and it will Bbe rewritten into a 6-dimensional vector:

\vec{b}=\left [ \begin{matrix} b_{11} & b_{12} & b_{22} & b_{13} & b_{23} &b_{33} \end{matrix} \right ]^T

The assumed Hcolumn i-th vector is \vec{h_i}=\left [ \begin{matrix} h_{i1} & h_{i2} & h_{i3} \end{matrix} \right ]^T, so

\vec{h_i}^TB\vec{h_j}=c_{ij}^{T}\vec{b}

in,c_{ij}=\left [ \begin{matrix} h_{i1}h_{j1}, & h_{i1}h_{j2}+h_{i2}h_{j1}, & h_{i2}h_{j2}, & h_{i3}h_{j1}+h_{i1}h_{j3}, & h_{i3}h_{j2}+h_{i2}h_{j3}, & h_{i3}h_{j3} \end{matrix} \right ]^T

Combining Rthe two properties of the rotation matrix, we can get:

\left [ \begin{matrix} c_{12}^{T} \\ (c_{12}-c_{22})^T \end{matrix} \right ]\vec{b}=C\vec{b}=0

If there are L calibration images, it Cis 2L\times 6a matrix of , L\geq 3at that time , the equation has \thing{b}a unique solution to the vector. When the vector \thing{b}is determined, the internal parameter matrix Kcan also be determined. At the same time, according to the properties of the rotation matrix R, the external parameters of each calibration image are calculated by the following formula:

\left\{\begin{matrix} \vec{r_1}=\lambda K^{-1}\vec{h_1}\\ \vec{r_2}=\lambda K^{-1}\vec{h_2}\\ \vec{r_3}=\vec{r_1}\times \vec{r_2}\\ t=\lambda K^{-1}\vec{h_3} \end{matrix}\right.

in,

\lambda =\frac{1}{\left \| K^{-1}\vec{h_1} \right \|}=\frac{1}{\left \| K^{-1}\vec{h_2} \right \|}

After obtaining the initial estimated value of the internal and external parameters of the camera, the estimated value of the internal and external parameters is optimized by minimizing the reprojection error, namely:

min\sum_{i=1}^{n}\sum_{j=1}^{m}\left \| p_{ij}-p(K,D,R_{i},t_{i},P_{j}) \right \|^2

Among them, p(K,D,R_i,t_i,P_j)is P_{j}the predicted value of the projection point calculated by using the internal and external parameters of the camera. isp_{ij} the detection value of the projected point on the image when the camera is in pose.\left [ R_i, t_i \right ]P_j

2. Feature extraction and matching

The external parameters of a multi-camera system are the three-dimensional space pose relationship between different cameras in the same world coordinate system, including the rotation matrix and translation matrix between the camera coordinate system and the reference coordinate system. Therefore, when calibrating the external parameters of the camera, First of all, it is necessary to determine the same world coordinate system, that is, the unified reference standard. Assume that the three-dimensional structure of the scene where the camera is located is known, and this three-dimensional structure is used as a reference standard for camera extrinsic parameter calibration to illustrate the principle of camera extrinsic parameter calibration. There are direct method and feature point method for camera external parameter calibration. The direct method is based on the assumption that the gray level is constant. First, the projection position of the feature point in the second image is found according to the pose estimation result of the first image, and then the projection of the feature point is optimized by minimizing the photometric error of the two pixels. Location. The premise of the direct method to calculate the camera position is to assume that the gray level of a three-dimensional point in space remains unchanged under various viewing angles, but the gray value is easily affected by the light or the position of the camera in practical applications. When the illumination changes or the camera changes the pose, the gray value of the image changes, and the direct method may cause errors in calculating the camera pose. The feature point method has little influence on hand lighting, and has better robustness than the direct method. Therefore, we directly introduce the feature point method for camera localization.

The feature point is a relatively representative point in the image. When the camera moves a little, the feature point in the image remains stable, and the same feature point can still be judged on different images, and then the same feature point is used to calculate the difference between different images. sports relationship. Common image feature point detection algorithms include: SIFT, SURF, ORB, etc. The SIFT algorithm fully considers the changes in illumination, scale, and rotation during the image acquisition process, and has the characteristics of high precision, but its calculation amount is huge, resulting in a slow calculation speed. The SURF algorithm optimizes the SIFT algorithm and improves the calculation speed of the algorithm. In addition, the ORB algorithm image feature extraction is faster and is not affected by noise points and image transformation to a certain extent. In this paper, the process of feature point detection using ORB algorithm is divided into two steps:

1. FAST corner point extraction. First, set the number of corner points to be finally extracted as N, calculate the Harris response value for the FAST corner points, and select N corner points with the largest response value to form a set of corner points; then realize scale invariance by constructing an image pyramid; finally use the gray The centroid method is used to determine the main direction of the image.

2. BRIEF descriptor. The ORB algorithm uses the BRIEF descriptor to describe the FAST corner point, which is composed of many 0s and 1s, and is used to represent the light and dark relationship of the pixel brightness around the key point.

The ORB algorithm uses 0 and 1 in the BRIEF descriptor to describe the light-dark relationship between two pixels (such as p and q) near the key point: if , then p>qtake 1, otherwise take 0. If 256 p and q around the key point are randomly selected, a 256-dimensional binary vector can be obtained, that is, each feature point has a 256-dimensional binary descriptor. I_tIt may be assumed that M feature points are extracted from an image x_{t}^{m} (m=1,2,...,M), and I_{t+1}N feature points are extracted from the image x_{t+1}^{n} (n=1,2,...,N). First calculate the Hamming distance between I_teach feature point in x_{t}^{m}the image and I_{t+1}all feature point descriptors in the image, then select the matching point with x_{t+1}^{n}the smallest Hamming distance , and finally calculate the matching points corresponding to all the feature points in turn, and complete Matching of images and .x_{t+1}^{n}x_{t}^{m}I_tI_tI_{t+1}

Guess you like

Origin blog.csdn.net/panpan_jiang1/article/details/126810498