[Stereo Vision (1)] Imaging Principle and Camera Distortion

This is a personal study note. I have borrowed a lot of good pictures and articles (references are at the end of the article), mainly to sort out relevant knowledge in order to form my own system. The text expression is pieced together, and the symbolic formula is manually entered. Please point out if there are any mistakes.

1. Imaging principle

1) Pinhole model

In scientific research, the internal process of a phenomenon is always complicated and difficult to see clearly, but smart scholars always use the simplest model for initial description and propose a relatively simple model. The imaging process is no exception. The process of mapping the coordinate points in the three-dimensional world to the two-dimensional image plane is described by a geometric model. There are many kinds of models, the simplest of which is called the pinhole model.
insert image description here
Note that in exploring imaging, a virtual imaging plane is often used for analysis.

2) Coordinate system conversion

insert image description here
From a point in the real world (world coordinate system) to an imaging point on a photo (pixel coordinate system) involves conversion between the following four coordinate systems.

  • World Coordinate System (WCS): The position of a point in the real world, describing the camera position, in m.
  • Camera Coordinate System (CCS): The center of the camera sensor is the origin, and the camera coordinate system is established in m.
  • Image Coordinate System (Film Coordinate System, FCS): A two-dimensional coordinate system with the principal point as the origin, the unit is mm.
  • Pixel Coordinate System (PCS): A two-dimensional coordinate system whose origin is the upper left corner of the photo, and the unit is none or pixel.

Next, follow the steps below to introduce the transformation of a point in each coordinate system during the imaging process of the camera.
insert image description here

1. World coordinate system to camera coordinate system

From the world coordinate system to the camera coordinate system (World to Camera, W2C), it is a simple three-dimensional coordinate system transformation (the change of the direction of the three coordinate axes and the origin). A point P w on the world coordinate system ( U , V , W ) {P_{w}(U,V,W)}Pw(U,V,W ) to a point P c ( X , Y , Z ) {P_{c}(X,Y,Z)}on the camera coordinate systemPc(X,Y,Z)

[ X Y Z ] = [ R 3 × 3 T 3 × 1 ] [ U V W 1 ] \begin{bmatrix} {X}\\ {Y}\\ {Z}\\ \end{bmatrix}= \begin{bmatrix} {R_{3\times3}}&{T_{3\times1}}\\ \end{bmatrix} \begin{bmatrix} {U}\\ {V}\\ {W}\\ {1}\\ \end{bmatrix} XYZ =[R3×3T3×1] UVW1

or using the second matrix expression:

[ X Y Z 1 ] = [ R 3 × 3 T 3 × 1 O 1 ] [ U V W 1 ] \begin{bmatrix} {X}\\ {Y}\\ {Z}\\ {1}\\ \end{bmatrix}= \begin{bmatrix} {R_{3\times3}}&{T_{3\times1}}\\ {O}&{1}\\ \end{bmatrix} \begin{bmatrix} {U}\\ {V}\\ {W}\\ {1}\\ \end{bmatrix} XYZ1 =[R3×3OT3×11] UVW1

Among them, [ R 3 × 3 T 3 × 1 ] \begin{bmatrix} {R_{3\times3}}&{T_{3\times1}}\\ \end{bmatrix}[R3×3T3×1] (或 [ R 3 × 3 T 3 × 1 O 1 ] \begin{bmatrix} {R_{3\times3}}&{T_{3\times1}}\\ {O}&{1}\\ \end{bmatrix} [R3×3OT3×11] ) is called the camera’sextrinsic matrix (Extrinsic Matrix), which describes the position of the camera in world coordinates (T 3 × 1 T_{3\times1}T3×1), and its pointing direction ( R 3 × 3 R_{3\times3}R3×3)。


2. Camera coordinate system to image coordinate system

Camera coordinate system to image coordinate system (Camera to Film, C2F), here consists of three-dimensional coordinates P c ( X , Y , Z ) {P_{c}(X,Y,Z)}Pc(X,Y,Z ) dimensionality reduction to two-dimensional coordinatesp ( x , y ) {p(x,y)}p(x,y)
insert image description here

According to similar triangles, we get:

x X = y Y = f Z \frac{x}{X}=\frac{y}{Y}=\frac{f}{Z} Xx=Yy=Zf

则:
[ x y 1 ] = [ f Z 0 0 0 f Z 0 0 0 1 Z ] [ X Y Z ] \begin{bmatrix} {x}\\ {y}\\ {1}\\ \end{bmatrix}= \begin{bmatrix} {\frac{f}{Z}}&{0}&{0}\\ {0}&{\frac{f}{Z}}&{0}\\ {0}&{0}&{\frac{1}{Z}}\\ \end{bmatrix} \begin{bmatrix} {X}\\ {Y}\\ {Z}\\ \end{bmatrix} xy1 = Zf000Zf000Z1 XYZ


3. Image coordinate system to pixel coordinate system

The image coordinate system to pixel coordinate system (Film to Pixel, F2P) is a simple affine transformation (Affine Transformation) process, mainly including the movement of the origin and the transformation of the scale.
insert image description here
There are:
u = xdx + u 0 , v = ydy + v 0 u=\frac{x}{d_x}+u_0, v=\frac{y}{d_y}+v_0u=dxx+u0,v=dyy+v0

矩阵形式:
[ u v 1 ] = [ 1 d x 0 u 0 0 1 d y v 0 0 0 1 ] [ x y 1 ] \begin{bmatrix} {u}\\ {v}\\ {1}\\ \end{bmatrix}= \begin{bmatrix} {\frac{1}{d_x}}&{0}&{u_0}\\ {0}&{\frac{1}{d_y}}&{v_0}\\ {0}&{0}&{1}\\ \end{bmatrix} \begin{bmatrix} {x}\\ {y}\\ {1}\\ \end{bmatrix} uv1 = dx1000dy10u0v01 xy1


4. Camera coordinate system to pixel coordinate system

Now start backtracking, following the previous step, the camera coordinate system to pixel coordinate system (Camera to Film, C2F) is:
[ uv 1 ] = [ 1 dx 0 u 0 0 1 dyv 0 0 0 1 ] [ f Z 0 0 0 f Z 0 0 0 1 Z ] [ XYZ ] = 1 Z [ fdx 0 u 0 0 fdyv 0 0 0 1 ] [ XYZ ] \begin{bmatrix} {u}\\ {v}\\ {1}\\ \end {bmatrix}= \begin{bmatrix} {\frac{1}{d_x}}&{0}&{u_0}\\ {0}&{\frac{1}{d_y}}&{v_0}\\ { 0}&{0}&{1}\\ \end{bmatrix} \begin{bmatrix} {\frac{f}{Z}}&{0}&{0}\\ {0}&{\frac{ f}{Z}}&{0}\\ {0}&{0}&{\frac{1}{Z}}\\ \end{bmatrix} \begin{bmatrix} {X}\\ {Y} \\ {Z}\\ \end{bmatrix}=\frac{1}{Z} \begin{bmatrix} {\frac{f}{d_x}}&{0}&{u_0}\\ {0}& {\frac{f}{d_y}}&{v_0}\\ {0}&{0}&{1}\\ \end{bmatrix} \begin{bmatrix} {X}\\ {Y}\\ { Z}\\ \end{bmatrix} uv1 = dx1000dy10u0v01 Zf000Zf000Z1 XYZ =Z1 dxf000dyf0u0v01 XYZ

其中, f d x 、 f d y \frac{f}{d_x}、\frac{f}{d_y} dxfdyfThat is, the focal length is divided by the pixel size, and the unit is pixel, indicating the pixel unit value of the focal length in the direction of two pixels. Since f , dx , dy {f}, {d_x}, {d_y} during camera calibrationfdxdyIt cannot be directly measured, and its combined value fdx 、 fdy \frac{f}{d_x}、\frac{f}{d_y}dxfdyfIt can be obtained by calibration, so fx = fdx , fy = fdy {f_x}=\frac{f}{d_x}, {f_y}=\frac{f}{d_y}fx=dxffy=dyfto represent two combined values. Then:
[ uv 1 ] = 1 Z [ fx 0 u 0 0 fyv 0 0 0 1 ] [ XYZ ] \begin{bmatrix} {u}\\ {v}\\ {1}\\ \end{bmatrix}= \frac{1}{Z} \begin{bmatrix} {f_x}&{0}&{u_0}\\ {0}&{f_y}&{v_0}\\ {0}&{0}&{1} \\ \end{bmatrix} \begin{bmatrix} {X}\\ {Y}\\ {Z}\\ \end{bmatrix} uv1 =Z1 fx000fy0u0v01 XYZ

If the two coordinate axes in the pixel coordinate system are not vertical (at this time, the pixel is not a rectangle, but a parallelogram), then there is a skew coefficient ss in the camera internal parameterss(skew coefficient) s = f x t a n ( α ) s=f_x tan(\alpha) s=fxt an ( α ) , if you are interested, you can deduce it.

insert image description here
此时:
[ u v 1 ] = 1 Z [ f x s u 0 0 f y v 0 0 0 1 ] [ X Y Z ] \begin{bmatrix} {u}\\ {v}\\ {1}\\ \end{bmatrix}= \frac{1}{Z} \begin{bmatrix} {f_x}&{s}&{u_0}\\ {0}&{f_y}&{v_0}\\ {0}&{0}&{1}\\ \end{bmatrix} \begin{bmatrix} {X}\\ {Y}\\ {Z}\\ \end{bmatrix} uv1 =Z1 fx00sfy0u0v01 XYZ

For example, [ fxsu 0 0 fyv 0 0 0 1 ] \begin{bmatrix} {f_x}&{s}&{u_0}\\ {0}&{f_y}&{v_0}\\ {0}&{0} &{1}\\\end{bmatrix} fx00sfy0u0v01 Intrinsics Matrix of the camera , which describes the transformation relationship from the camera coordinate system to the pixel coordinate system, reflects the properties of the camera itself, usually with the symbol KKK means (note thatthe internal reference matrix in many versions does not contain a skew coefficient, that is, it is 0 by default). In the camera coordinate system,ZZZ corresponds to the depth of a three-dimensional point, often called the scale factorλ \lambdaλ , then the camera coordinate system to the pixel coordinate system:
λ [ uv 1 ] = [ fxsu 0 0 fyv 0 0 0 1 ] [ XYZ ] \lambda\begin{bmatrix} {u}\\ {v}\\ {1} \\ \end{bmatrix}= \begin{bmatrix} {f_x}&{s}&{u_0}\\ {0}&{f_y}&{v_0}\\ {0}&{0}&{1} \\ \end{bmatrix} \begin{bmatrix} {X}\\ {Y}\\ {Z}\\ \end{bmatrix}l uv1 = fx00sfy0u0v01 XYZ
Example:
K = [ fxsu 0 0 fyv 0 0 0 1 ] K=\begin{bmatrix} {f_x}&{s}&{u_0}\\ {0}&{f_y}&{v_0}\\ {0} &{0}&{1}\\\end{bmatrix}K= fx00sfy0u0v01
有:
λ p = KP c \lambda p=KP_cp _=KPc

Supplement: Z = 1 Z=1Z=1 is called the normalized plane, and the coordinates on the normalized plane are called normalized coordinates.


5. World coordinate system to pixel coordinate system

World coordinate system to pixel coordinate system (World to Pixel, W2P), the whole string:
λ [ uv 1 ] = [ fxsu 0 0 fyv 0 0 0 1 ] [ XYZ ] = [ fxsu 0 0 fyv 0 0 0 1 ] [ R 3 × 3 T 3 × 1 ] [ UVW 1 ] \lambda\begin{bmatrix} {u}\\ {v}\\ {1}\\ \end{bmatrix}= \begin{bmatrix} {f_x}& {s}&{u_0}\\ {0}&{f_y}&{v_0}\\ {0}&{0}&{1}\\ \end{bmatrix} \begin{bmatrix} {X}\\ {Y}\\ {Z}\\ \\ end{bmatrix}=\begin{bmatrix} {f_x}&{s}&{u_0}\\ {0}&{f_y}&{v_0}\\ {0} &{0}&{1}\\ \end{bmatrix}\begin{bmatrix} {R_{3\times3}}&{T_{3\times1}}\\ \end{bmatrix} \begin{bmatrix} { U}\\ {V}\\ {W}\\ {1}\\ \end{bmatrix}l uv1 = fx00sfy0u0v01 XYZ = fx00sfy0u0v01 [R3×3T3×1] UVW1

The conversion from the world coordinate system to the pixel coordinate system actually expresses the projection relationship from the spatial point to the image point in the perspective projection, so the conversion matrix is ​​called the projection matrix (Projection Matrix) , commonly used MMM indicates that the projection matrix is ​​a 3x4 matrix through matrix operations, which is the product of the internal reference matrix and the external reference matrix.

λ p = K [ R 3 × 3 T 3 × 1 ] [ P w 1 ] = M [ P w 1 ] \lambda {p}=K \begin{bmatrix} {R_{3\times3}}&{T_{3\times1}}\\ \end{bmatrix} \begin{bmatrix} {P_w}\\ {1}\\ \end{bmatrix}=M\begin{bmatrix} {P_w}\\ {1}\\ \end{bmatrix} p _=K[R3×3T3×1][Pw1]=M[Pw1]

In addition, if the second matrix expression is used, the matrix dimension will change.
insert image description here


2. Camera Distortion

Due to the inherent characteristics of the camera lens (convex lens converges light, concave lens diverges light), the imaging straight line will become a curve.

Camera distortion mainly includes radial distortion and tangential distortion.

Distortion characteristics:

  • Radial distortion is mainly due to the geometry of the lens changing the shape of the line.
  • Tangential distortion is mainly due to the fact that the lens is not mounted perfectly parallel to the image plane.
  • In practical experience, image geometry pays more attention to radial distortion, so sometimes the process of distorted image correction may ignore tangential distortion.
    insert image description here

1) Radial distortion

Radial distortion mainly includes:

  • barrel distortion
  • pincushion distortion
  • mustache distortion
    insert image description here

It can be seen that the characteristics of the radially distorted image are:

  • Centrosymmetric
  • Straight Variation
distortion features Scenes
barrel distortion The center is enlarged, and the farther away from the optical center, the smaller the image magnification Fisheye lens, wide-angle/panoramic pictures
pincushion distortion Pincushion distortion squeezes the picture, (like a squashed pillow) Telephoto lenses often suffer from pincushion distortion to eliminate the spherical effect
mustache distortion A mixture of the two types above, which starts with barrel distortion near the optical axis and progresses to pincushion distortion toward the edges of the image

Radial Distortion Correction

In general, the radial distortion of an image can be represented by a low-order polynomial model:

x u n d i s t = x d i s t ( 1 + k 1 r 2 + k 2 r 4 + k 3 r 6 ) y u n d i s t = y d i s t ( 1 + k 1 r 2 + k 2 r 4 + k 3 r 6 ) x_{undist}=x_{dist}(1+k_1r^2+k_2r^4+k_3r^6) \\ y_{undist}=y_{dist}(1+k_1r^2+k_2r^4+k_3r^6) xundist=xdist(1+k1r2+k2r4+k3r6)yundist=ydist(1+k1r2+k2r4+k3r6)

其中 r 2 = x d i s t 2 + y d i s t 2 r^2=x_{dist}^2+y_{dist}^2 r2=xdist2+ydist2 ( x d i s t , y d i s t ) (x_{dist},y_{dist}) (xdistydist) is the point of the normalized camera coordinate system, that is, the origin of the coordinates has been moved to the main point, and the pixel coordinates are divided by the focal length,xdist = XZ = u − uofx x_{dist}=\frac{X}{Z}= \frac{u-u_o}{f_x}xdist=ZX=fxuuo y d i s t = Y Z = v − v o f y y_{dist}=\frac{Y}{Z}=\frac{v-v_o}{f_y} ydist=ZY=fyv vo, which can be obtained from the distorted image. k 1 , k 2 , k 3 k_1, k_2, k_3k1k2k3It is the radial distortion parameter (generally, the first two terms of the polynomial can be used, and the third term will be used for cameras with large distortion such as fisheye, and the three terms are always correct by default), which can be obtained through checkerboard calibration.

2) Tangential Distortion

Tangential distortion is caused by the camera sensor not being parallel to the lens.
insert image description here

Tangential Distortion Correction

The tangential distortion of the image can also be represented by a low-order polynomial model:
x undist = x dist + [ 2 p 1 x dist y dist + p 2 ( r 2 + 2 x dist 2 ) ] y undist = y dist + [ p 1 ( r 2 + 2 y dist 2 ) + 2 p 2 x dist y dist ] x_{\text {undist }}=x_{\text {dist }}+\left[2 p_{1} x_{\text { dist }} y_{\text {dist }}+p_{2}\left(r^{2}+2 x_{\text {dist }}^{2}\right)\right]\\y_{\text {undist }}=y_{\text {dist }}+\left[p_{1}\left(r^{2}+2 y_{\text {dist }}^{2}\right)+2 p_{ 2} x_{\text {dist }} y_{\text {dist }}\right ]xundist =xdist +[2p1xdist ydist +p2(r2+2x _dist 2)]yundist =ydist +[p1(r2+2 ydist 2)+2p2xdist ydist ]

p 1 p_1 p1and p 2 p_2p2is the tangential distortion parameter.

3) Total Distortion Correction

Consider both radial and tangential distortions:

x u n d i s t = x d i s t ( 1 + k 1 r 2 + k 2 r 4 + k 3 r 6 ) + [ 2 p 1 x d i s t y d i s t + p 2 ( r 2 + 2 x d i s t 2 ) ] y u n d i s t = y d i s t ( 1 + k 1 r 2 + k 2 r 4 + k 3 r 6 ) + [ p 1 ( r 2 + 2 y d i s t 2 ) + 2 p 2 x d i s t y d i s t ] x_{ {undist }}=x_{dist}(1+k_1r^2+k_2r^4+k_3r^6)+\left[2 p_{1} x_{ {dist }} y_{ {dist }}+p_{2}\left(r^{2}+2 x_{dist }^{2}\right)\right]\\ y_{ {undist }}=y_{dist}(1+k_1r^2+k_2r^4+k_3r^6)+\left[p_{1}\left(r^{2}+2 y_{ {dist }}^{2}\right)+2 p_{2} x_ {dist }y_{ {dist }}\right] xundist=xdist(1+k1r2+k2r4+k3r6)+[2p1xdistydist+p2(r2+2x _dist2)]yundist=ydist(1+k1r2+k2r4+k3r6)+[p1(r2+2 ydist2)+2p2xdistydist]
There are five distortion parametersk 1 , k 2 , k 3 , p 1 , p 2 k_1, k_2, k_3, p_1, p_2k1k2k3p1p2, these 5 distortion parameters together with the internal reference matrix need to be calibrated for the camera. But the output order in OpenCV camera calibration is k 1 , k 2 , p 1 , p 2 , k 3 k_1, k_2, p_1, p_2, k_3k1k2p1p2k3, because k 3 k_3k3Not important, not needed in many cases. (Isn't it true?)


References:
[1] CSE/EE486 Computer Vision I
[2] A detailed explanation of camera calibration algorithm principle
[3] Stereo Vision Getting Started Guide (1): Coordinate System and Camera Parameters
[4] Camera calibration (Camera calibration) principle and steps

Guess you like

Origin blog.csdn.net/m0_50910915/article/details/130878017