1. Camera calibration principle


   Camera calibration can be said to be the basis of computer vision/machine vision, and it is also a frequently asked question during the interview process. Camera calibration involves a wide range of knowledge, such as imaging geometry, lens distortion, homography matrix, nonlinear optimization, etc. In the binocular ranging system, camera calibration can eliminate distortion and perform stereo correction, thereby improving the accuracy of disparity calculation, so that an accurate depth map can be obtained.

1. Camera model

1.1 Each coordinate system

  To determine the relationship between the three-dimensional geometric position of a certain point in space and its corresponding point in the image, the geometric model of camera imaging (each coordinate system) must be established. The conversion parameters between these coordinate systems are the camera parameters. The process of solving the parameters is called Camera calibration (camera calibration). Various coordinate systems required to establish a stereo vision system, including the world coordinate system, camera coordinate system, and image coordinate system (physical and pixel coordinate system).

coordinate system name Coordinate system description
World Coordinate System (3D) Describe the reference coordinate system introduced by the position of the target in the real world (X w , Y w , Z w )
Camera coordinate system (3D) The bridge connecting the world coordinate system and the image coordinate system, generally take the optical axis of the camera as the z-axis (X c , Y c , Z c )
Image physical coordinate system (2D) According to the introduction of the projection relationship, it is convenient to further obtain the pixel coordinates, the unit is mm, and the origin of the coordinates is the intersection position (x, y) of the camera optical axis and the physical coordinate system of the image
Image pixel coordinate system (2D) The information actually read from the camera, the discretization of the physical coordinates of the image, in pixels, the origin of the coordinates is in the upper left corner (u, v)

First, clarify the conversion relationship between each coordinate system:

1. World coordinate system and camera coordinate system
insert image description here
  This is to convert a three-dimensional coordinate system (X w , Y w , Z w ) into another three-dimensional coordinate system (X c , Y c , Z c ), these two coordinates The transformation between the two coordinate systems belongs to the rigid body transformation, and the object only changes the spatial position (translation) and orientation (rotation) in the two coordinate systems, but does not change its shape. The conversion relationship between them can be completed by the rotation matrix R and the translation matrix T. These two matrices reflect the conversion relationship between the world coordinate system and the camera coordinate system, and are collectively called the external parameter matrix L w . The external parameter matrix is ​​obtained, so that a point in the known world coordinate system can obtain the position of this point in the camera coordinate system through the conversion relationship, and vice versa.

2. Camera coordinate system and image physical coordinate system

  This is a conversion of a three-dimensional coordinate system into a two-dimensional coordinate system. The conversion between the two coordinate systems is obtained through the geometric projection model relationship. The following is a schematic diagram of the projection relationship between the two coordinate systems: 3. Image physical coordinate
insert image description here
system and image pixel coordinate system

   First, a visual example is used to illustrate the difference between the two coordinate systems. The physical coordinate system is a continuous concept, which is in millimeters, just like the specific coordinate value (3.4, 5.9) of an audience in a movie theater; the pixel coordinate system is a discrete concept, which is in pixels , can only be integer-valued coordinates, just like the position of a certain audience in a movie theater is (third row, sixth column). In addition, it should be noted that the origin positions of the two coordinate systems are also different. The physical coordinate system defines the origin as the intersection of the camera optical axis and the image physical coordinate system, which is usually called the principal point; while the pixel coordinate system is Take the upper left corner of the pixel image as the origin.

Convert world coordinate system to pixel coordinate system
insert image description here

   Where f is the focal length of the camera, and the unit is generally mm; d x , d y are the pixel sizes; u 0 , V 0 are the center of the image. f x = f/d x , f y = f/d y , are called the normalized focal lengths on the x-axis and y-axis, respectively.

1.2 Camera Distortion Model

The continuous solution k is given by the distorted p
xdistorted = x ∗ ( 1 + k 1 r 2 + k 2 γ 4 + k 3 γ 6 ) + 2 p 1 xy + p 2 ( γ 2 + 2 x 2 ) . ydistorted = y ∗ ( 1 + k 1 r 2 + k 2 γ 4 + k 3 γ 6 ) + 2 p 1 xy + p 2 ( γ 2 + 2 y 2 ) \begin{matrix} x_{distorted} = x * (1 + k_{1}r^{2} + k_{2}γ^{4} + k_{3}γ^{6}) + 2p_{1}xy + p_{2}(γ^{2} + 2x^{2})\\ y_{distorted} = y * (1 + k_{1}r^{2} + k_{2}γ^{4} + k_{3}γ^{6}) + 2p_{1}xy + p_{2}(γ^{2} + 2y^{2}) \end{matrix}xdistorted=x(1+k1r2+k2c4+k3c6)+2p1xy+p2( c2+2x _2)ydistorted=y(1+k1r2+k2c4+k3c6)+2p1xy+p2( c2+2 y2)
whereγ 2 = x 2 + y 2 γ^{2} = x^{2} + y^{2}c2=x2+y2 .
   Generally, 2 or 3 k values ​​are selected. This is proved to be able to obtain better results. If more k is obtained, the impact will not be great and can be ignored, and may even lead to poor results.

Distortion model: pincushion distortion (k>0) and barrel distortion (k<0)
insert image description here

When k>0, the larger r is (the farther the point is from the center), the larger the distortion is, and the smaller r is, the smaller the distortion is, showing a pincushion shape.

When k<0, the larger r is (the farther the point is from the center), the smaller the distortion is, and the smaller r is, the larger the distortion is, showing a barrel shape.

1.3 Camera Calibration Parameters

Internal reference:

(number of pixels per unit length) f x , f y

(principal point coordinates) c x , c y

(Distortion coefficient) k 1 , k 2 , k 3 , p 1 , p 2

External parameters:
(rotation and translation matrix) R, T

2. Zhang Youzheng calibration method

   Zhang Zhengyou’s calibration method uses the checkerboard calibration board as shown in the figure below. After obtaining an image of the calibration board, the corresponding image detection algorithm can be used to obtain the pixel coordinates (u, v) of each corner point.

   Zhang Zhengyou’s calibration method fixes the world coordinate system on the checkerboard, so the physical coordinates of any point on the checkerboard are W=0. Since the world coordinate system of the calibration board is artificially defined in advance, the size of each grid on the calibration board is known. Yes, we can calculate the physical coordinates (U, V, W=0) of each corner point in the world coordinate system.

   Use this information: the pixel coordinates (u, v) of each corner point, and the physical coordinates (U, V, W=0) of each corner point in the world coordinate system to calibrate the camera and obtain the internal and external parameters of the camera Matrix, distortion parameters.

The idea of ​​Zhang Zhengyou calibration method to calibrate the internal and external parameters of the camera is as follows:

1) Solve the product of the internal parameter matrix and the external parameter matrix;

2) Solve the internal parameter matrix;

3) Solve the external parameter matrix.

2.1 Solve the product of internal parameter matrix and external parameter matrix

  If the world coordinate system is fixed on the checkerboard, the physical coordinate of any point on the checkerboard is W=0. Therefore, the imaging model of the original single point without distortion can be transformed into the following formula. Among them, R1, R2 are the first two columns of the rotation matrix R. For simplicity, the internal reference matrix is ​​denoted as A .
insert image description here
   We give some explanations for the above formula. For different pictures, the internal reference matrix A is a fixed value; for the same picture, the internal reference matrix A and the external parameter matrix (R 1 R 2 T) are constant values; for a single point on the same picture, the internal reference matrix A and the external parameter Matrix (R 1 R 2 T), scale factor Z is a constant value.

   We record A(R 1 R 2 T) as matrix H, and H is the product of internal reference matrix and external reference matrix, and the three columns of matrix H are (H 1 , H 2 , H 3 ), then:

Using the above formula to eliminate the scale factor Z, we can get:

   At this time, the scale factor Z has been eliminated, so the above formula is valid for all corner points on the same picture. (u,v) are the coordinates of the corners of the calibration board in the pixel coordinate system, and (U,V) are the coordinates of the corners of the calibration board in the world coordinate system. Through the image recognition algorithm, we can get the pixel coordinates (u, v) of the corner points of the calibration board, and since the world coordinate system of the calibration board is artificially defined, the size of each grid on the calibration board is known, we can Calculate (U,V) in the world coordinate system.

   Since H here is a homogeneous matrix, there are 8 independent unknown elements. Each calibration board corner can provide two constraint equations (u, U, V correspondence, v, U, V correspondence provides two constraint equations), therefore, when the calibration board corner on a picture When the number is equal to 4, the matrix H corresponding to the picture can be obtained. When the number of corner points of the calibration board on a picture is greater than 4, use the least square method to return the best matrix H.

2.2 Solve the internal parameter matrix

   We know the matrix H=A(R 1 R 2 T), and then we need to solve the internal reference matrix A of the camera. We use R1, R2 as the two columns of the rotation matrix R, and there is a unit orthogonal relationship, namely:
R 1 TR 2 = 0 R 1 TR 1 = R 2 TR 2 = 1 \begin{matrix} R_{1}^{ T} R2 = 0\\ R_{1}^{T}R_{1} = R_{2}^{T}R_{2} = 1 \end{matrix}R1TR2 _=0R1TR1=R2TR2=1
Then from the relationship between H and R 1 , R 2 , we can know:
R 1 = A − 1 H 1 R 2 = A − 1 H 2 \begin{matrix} R_{1} = A^{-1}H_{1} \\ R_{2} = A^{-1}H_{2} \end{matrix}R1=A1H1R2=A1H2
代入可得:
H 1 T A − T A − 1 H 2 = 0 H 1 T A − T A − 1 H 1 = H 2 T A − T A − 1 H 2 = 1 \begin{matrix} H_{1}^{T}A^{-T}A^{-1}H_{2} = 0\\ H_{1}^{T}A^{-T}A^{-1}H_{1} = H_{2}^{T}A^{-T}A^{-1}H_{2} = 1 \end{matrix} H1TA- T A1H2=0H1TA- T A1H1=H2TA- T A1H2=1
In addition, we found that the matrix A −T A −1    exists in the above two constraint equations . Therefore, we record A −T A −1 =B , then B is a symmetric matrix. We try to solve the matrix B first, and then solve the internal reference matrix A of the camera through the matrix B.

At the same time, for the sake of simplicity, we record the internal reference matrix A as:
insert image description here
Then:
insert image description here
then use matrix A to represent matrix B: Note: Since B is a symmetric matrix, B 12 , B 13 , and B 23
insert image description here
appear twice in the above formula .

Here, we can use B=A −T A −1 to transform the constraint equation obtained through R1, R2 unit orthogonality into:

H1 TBH2 = 0
H1 TBH1 = H2 TBH2 = 1
Therefore, to solve matrix B, we must compute H i T B H j . but:

insert image description here
The above equation looks a bit complicated, but it is not, we can remember:
insert image description here

Then the above equation can be transformed into: H i T BH j = v ij b
At this time, the constraint equation obtained by unit orthogonality of R1 and R2 can be transformed into:
v 12 T b = 0 v 11 T b = v 22 T b = 1 \begin{matrix} v_{12}^{T}b = 0\\ v_{11}^{T}b = v_{22}^{T}b = 1 \end{matrix}v12Tb=0v11Tb=v22Tb=1
即:
[ v 12 T v 11 T − v 22 T ] b = v b = 0 \begin{bmatrix} v_{12}^{T}\\ v_{11}^{T} - v_{22}^{T} \end{bmatrix}b = vb = 0 [v12Tv11Tv22T]b=vb=0
Among them, matrix
v = [ v 12 T v 11 T − v 22 T ] v = \begin{bmatrix} v_{12}^{T}\\ v_{11}^{T} - v_{22}^{ T} \end{bmatrix}v=[v12Tv11Tv22T]
Since the matrix H is known, and the matrix v is all composed of the elements of the matrix H, the matrix v is known.

   At this point, we only need to solve the vector b to get the matrix B. Each calibration board picture can provide a constraint relationship of vb=0, which contains two constraint equations. However, vector b has 6 unknown elements. Therefore, the two constraint equations provided by a single image are not enough to solve the vector b. Therefore, we only need to take 3 photos of the calibration board to obtain 3 constraints of vb=0, that is, 6 equations, and then we can solve the vector b. When the number of calibration board pictures is greater than 3 (in fact, 15 to 20 calibration board pictures are generally required), the best vector b can be fitted by least squares, and the matrix B can be obtained.
insert image description here
According to the corresponding relationship between the elements of matrix B and camera internal parameters α, β, γ, u0, v0 (as in the above formula), it can be obtained: The
insert image description here
internal parameter matrix of the camera can be obtained
insert image description here

2.3 Solve the external parameter matrix

   Here again, for the same camera, the internal reference matrix of the camera depends on the internal parameters of the camera. No matter what the positional relationship between the calibration plate and the camera is, the internal reference matrix of the camera remains unchanged. This is why in Part 2 "Solving the Internal Reference Matrix", we can use the matrix H obtained from different pictures (the relationship between the calibration board and the camera is different) to jointly solve the camera internal reference matrix A.

   However, the extrinsic parameter matrix reflects the positional relationship between the calibration board and the camera. For different pictures, the positional relationship between the calibration board and the camera has changed, and at this time the extrinsic parameter matrix corresponding to each picture is different.

   In the relationship: A(R1 R2 T)=H, we have solved matrix H (same for the same picture, different for different pictures), matrix A (same for different pictures). Through the formula: (R1 R2 T)=A −1 H , the external parameter matrix (R1 R2 T) corresponding to each picture can be obtained.

   Note that it is worth pointing out here that the complete extrinsic matrix is:
( RT 0 1 ) \begin{pmatrix} R&T\\ 0&1\\ \end{pmatrix}(R0T1)
However, since Zhang Zhengyou’s calibration board selects the origin of the world coordinate system on the checkerboard, the physical coordinate W=0 of any point on the checkerboard will cancel the third column R3 of the rotation matrix R, therefore, R3 in the coordinate transformation has no effect. But R3 needs to make R satisfy the property of rotation matrix, that is, the unit is orthogonal between columns, so R3 can be calculated by the cross product of vectors R1 and R2, that is, R3=R1×R2.

At this point, both the intrinsic and extrinsic parameter matrices of the camera have been obtained.

3 Calibrate the distortion parameters of the camera

Zhang Zhengyou's calibration method only considers the radial distortion which has great influence in the distortion model.

The radial distortion formula (2nd order) is as follows:

x ^ = x ( 1 + k 1 r 2 + k 2 r 4 ) y ^ = y ( 1 + k 1 r 2 + k 2 r 4 ) \begin{matrix} \widehat{x} = x(1+ k_{1}r^{2} + k_{2}r^{4})\\ \widehat{y} = y(1+ k_{1}r^{2} + k_{2}r^{4}) \end{matrix} x =x(1+k1r2+k2r4)y =y(1+k1r2+k2r4)

Among them, (x, y), ( x ^ \widehat{x}x , y ^ \widehat{y} y ) are the ideal undistorted normalized image coordinates and the distorted normalized image coordinates respectively, and r is the distance from the image pixel to the image center, that is, r 2 = x 2 +y 2 .

The conversion relationship between image coordinates and pixel coordinates is:
( uv 1 ) = ( 1 d X − cot θ d X u 0 0 1 d Y sin θ v 0 0 0 1 ) ( xy 1 ) \begin{pmatrix} u \\ v \\ 1 \end{pmatrix} = \begin{pmatrix} {1 \over dX} & -{cotθ \over dX} & u_{0} \\ 0 & {1 \over dYsinθ} & v_{0} \ \ 0 & 0 & 1 \end{pmatrix} \begin{pmatrix} x \\ y \\ 1 \end{pmatrix} uv1 = dX100dXcodYsinθ10u0v01 xy1

Among them, (u, v) is the ideal undistorted pixel coordinates. Since θ is close to 90∘ , the above formula is approximated as
u = x / d X + u 0 v = y / d Y + v 0 \begin{matrix} u = x/dX + u_{0}\\ v ​​= y /dY + v_{0} \end{matrix}u=x/dX+u0v=y / d Y+v0
Similarly, the distorted pixel coordinates ( u ^ \widehat{u}u , v ^ \widehat{v} v ) is:
u ^ = x ^ / d X + u 0 v ^ = v ^ / d Y + v 0 \begin{matrix} \widehat{u} = \widehat{x}/dX + u_{0 }\\ \widehat{v} = \widehat{v}/dY + v_{0} \end{matrix}u =x /dX+u0v =v /dY+v0
代入很向纸变公式(2nd order)则有:
u ^ − u 0 = ( u − u 0 ) ( 1 + k 1 r 2 + k 2 r 4 ) v ^ − v 0 = ( v − v 0 ) ( 1 + k 1 r 2 + k 2 r 4 ) \begin{matrix} \widehat{u} - u_{0} = (u-u_{0})(1 + k_{1}r^{2} + k_ {2}r^{4})\\ \wide{v} - v_{0} = (v-v_{0})(1 + k_{1}r^{2} + k_{2}r^{ 4}) \end{matrix}u u0=(uu0)(1+k1r2+k2r4)v v0=(vv0)(1+k1r2+k2r4)
可化简得:
u ^ = u + ( u − u 0 ) ( k 1 r 2 + k 2 r 4 ) v ^ = v + ( v − v 0 ) ( k 1 r 2 + k 2 r 4 ) \begin{matrix} \widehat{u} = u + (u-u_{0})( k_{1}r^{2} + k_{2}r^{4})\\ \widehat{v} = v + (v-v_{0})( k_{1}r^{2} + k_{2}r^{4}) \end{matrix} u =u+(uu0)(k1r2+k2r4)v =v+(vv0)(k1r2+k2r4)
即为:
[ ( u − u 0 ) r 2 ( u − u 0 ) r 4 ( v − v 0 ) r 2 ( v − v 0 ) r 4 ] [ k 1 k 2 ] = [ u ^ − u v ^ − v ] \begin{matrix} \begin{bmatrix}(u - u_{0})r^{2} & (u-u_{0})r^{4}\\ (v - v_{0})r^{2} & (v-v_{0})r^{4} \end{bmatrix} \begin{bmatrix}k_{1}\\ k_{2} \end{bmatrix} = \begin{bmatrix}\widehat{u} - u\\ \widehat{v} - v \end{bmatrix} \end{matrix} [(uu0)r2(vv0)r2(uu0)r4(vv0)r4][k1k2]=[u uv v]
u ^ \widehat{u} in the above formulau v ^ \widehat{v} v It can be obtained by identifying the corner points of the calibration board, and each corner point can construct two of the above equations. There are m images, and there are n corner points of the calibration board on each image, then all the obtained equations can be combined to obtain a k=[k 1 ,k 2 ] T constraint equation with mn unknowns , and the constraint equation The coefficient matrix is ​​recorded as D, and the non-homogeneous term on the right side of the equation is recorded as d, which can be recorded in matrix form:
D k = d \begin{matrix} Dk = d \end{matrix}Dk=d
Then use the least square method to obtain:
k = [ k 1 k 2 ] = ( DTD ) − 1 DT d \begin{matrix} k = \begin{bmatrix}k_{1}\\ k_{2} \end{ bmatrix} = (D^{T}D)^{-1}D^{T}d \end{matrix}k=[k1k2]=(DTD)1 DTd
At this point, the distortion correction parameters of the camera have been calibrated.

It should be pointed out that when deriving the above formula, the second-order radial distortion is taken as an example, but in fact, the same is true for higher-order radial distortion, except that more constraint equations are required.

4. LM algorithm parameter optimization

Guess you like

Origin blog.csdn.net/baidu_39231810/article/details/128628651