[Multi-view geometry series in computer vision] Understanding the pinhole camera model in simple terms

If you review the past and learn the new, you can become a teacher!

1. Reference materials

"Multiple-view Geometry in Computer Vision - Chapter 5" -Richard Hartley, Andrew Zisserman.

2. Introduction to the pinhole model

1. Important concepts

Insert image description here

1.1 Projection center/camera center/optical center

projection centercalledcamera center, also known asoptical center. The center of projection is located at the origin of a Euclidean coordinate system.

1.2 Image plane/focus plane

平面 Z = f Z=f Z=f is calledimage planeorfocus plane

1.3 Principal axis/primary ray

The vertical line from the center of the camera to the image plane is called the camera'sSpindleorchief ray

1.4 Main points

The intersection of the principal axis and the image plane is calledmain point

1.5 Main plane (camera)

The plane parallel to the image plane through the center of the camera is calledmain plane of camera

2. Camera projection

The reduction from a 3D world to a 2D image is a projection process, in which we lose one dimension. A common way to model this process is to usecentral projection, a ray is drawn from a point in space from the 3D world point to a fixed point ( projection center ) in space. This ray will intersect the specific plane selected as the image plane in space. The intersection of a ray with the image plane represents the image at that point.
Insert image description here

Under the pinhole camera model, the 3-dimensional spatial coordinates are X = (X, Y, Z) TX=(X, Y, Z)^TX=(X,Y,Z)T 's pointXXX is projected to a point on the image plane, which is the connection pointXXThe intersection of the straight line between X and the projection center and the image plane. According to similar triangles, you can quickly calculate the point(X, Y, Z) T (X, Y, Z)^T(X,Y,Z)T is mapped to the point( f X / Z , f Y / Z , f ) T (fX/Z, fY/Z, f)^T(fX/Z,fY/Z,f)T. _ After omitting the last image coordinate, the center projection from world coordinates to image coordinates is:
( X , Y , Z ) T ↦ ( f X / Z , f Y / Z ) T ( 1 ) (X,Y,Z) ^{T}\mapsto(fX/Z,fY/Z)^{T}\quad(1)(X,Y,Z)T(fX/Z,fY/Z)T( 1 )
This isFrom the 3-dimensional Euclidean space IR 3 \text{IR}^3AND3 to 2-dimensional Euclidean spaceIR 2 \text{IR}^2ANDA mapping of 2

3. Projection matrix

The concept of homogeneous coordinates: Homogeneous coordinates use N+1 dimensions to describe an N-dimensional coordinate.
Homogeneous coordinates of a point x = ( x 1 , x 2 , x 3 ) T x=(x_1,x_2,x_3)^Tx=(x1,x2,x3)T (which is a 3-dimensional vector) and non-homogeneous coordinates( x , y ) T (x,y)^T(x,y)T (it is a 2-dimensional vector).

If the world and image points are represented by homogeneous vectors, then the central projection can simply be expressed as a linear mapping between homogeneous coordinates. Specifically, formula (1) formula (1)Substitute ( 1 ) for the following variable:
[ XYZ 1 ] ↦ [ fxfyz ] = [ f 0 f 0 1 0 ] [ XYZ 1 ] ( 2 ) \left.\left[\begin{array}{c}\ mathbf{X}\\\mathbf{Y}\\\mathbf{Z}\\\mathbf{1}\end{array}\right.\right]\mapsto\left[\begin{array}{c}f \mathbf{x}\\f\mathbf{y}\\\mathbf{z}\end{array}\right]=\left[\begin{array}{cc}f&&&0\\&f&&0\\&&1&0\end{ array}\right]\left[\begin{array}{c}\mathbf{X}\\\mathbf{Y}\\\mathbf{Z}\\\mathbf{1}\end{array}\right] \quad(2) XYZ1 fxfyz = ff1000 XYZ1 ( 2 )
index[ f 0 f 0 1 0 ] \left[\begin{array}{cc}f&&&0\\&f&&0\\&&1&0\end{array}\right] ff1000 means 3 ∗ 4 3*434 homogeneous camerasprojection matrix, recorded as PPP P P P can be written asdiag ( f , f , 1 ) [ I ∣ 0 ] diag(f,f,1)[I|0]diag(f,f,1 ) [ I ∣0 ] , wherediag ( f , f , 1 ) diag(f,f,1)diag(f,f,1 ) isa diagonal matrix, and[ I ∣ 0 ] [I|0][ I |0 ] indicates that the matrix is ​​divided into a3 ∗ 3 3*333 ​Identity matrixplus azero column vector. Then, the camera projection matrix of the centrally projected pinhole model can be expressed as:
P = diag ( f , f , 1 ) [ I ∣ 0 ] P=diag(f,f,1)[I|0]P=diag(f,f,1)[I∣0]

The concept of identity matrix: The identity matrix, also known as the identity matrix , is a square matrix whoseThe elements on the diagonal are 1, and the remaining elements are 0, Recording III orEEE. _ The size of the identity matrix is ​​determined by its dimensions. For example, the third-order identity matrix is ​​a 3x3 matrix.

Identity matrices have many important properties in linear algebra. For example,For any matrix A, the product of the identity matrix 1 and A is equal to A itself. This is because each element of the identity matrix is ​​multiplied by the corresponding element of A and added together, and the result is A itself. This property has important applications in matrix transposition and inverse operations.

Identity matrices also play an important role in deep learning. In neural networks,Identity matrices are often used as initialization weight matrices. When initializing the weight matrix, setting it to an identity matrix can make the initial state of the neural network more stable. This is because the identity matrix has a certain degree of symmetry and balance, which can avoid problems such as gradient disappearance or gradient explosion, and helps improve the training effect of the model .

Identity matrices can also be used to measure similarity of matrices. In image processing and pattern recognition, we often need to compare the similarity of two matrices. By calculating the difference between two matrices, a measure of their similarity can be obtained. As a special matrix, the identity matrix has obvious differences compared with other matrices and can be used to measure the similarity between matrices.

We now introduce the following notation: world point XXX uses a 4-dimensional homogeneous vector(X, Y, Z, 1) (X,Y,Z,1)(X,Y,Z,1 ) means; image pointxxx is expressed as a 3-dimensional homogeneous vector. Thenformula (2) formula (2)Formula ( 2 ) can be compactly written as:
x = PX x=PXx=PX

4. Main point offset

Official ( 1 ) Official (1)Formula ( 1 ) assumes that the coordinate origin of the image plane is at the principal point. The actual situation may not be like this, as shown in the figure below:

Insert image description here
Camera coordinate system (xcam, ycam) T (x_{cam},y_{cam})^T(xcam,ycam)The coordinate origin of T is the camera center, and the projection of this origin on the image plane is the principal point p. Image coordinate system(x, y) T (x,y)^T(x,y)The coordinate origin of T is the lower left corner of the image.

Therefore, the mapping in the general case is:
( X , Y , Z ) T ↦ ( f p_x,fY/Z+p_y)^{T} \\(X,Y,Z)T(fX/Z+px,fY/Z+py)T
其中 ( p x , p y ) T (p_x,p_y)^T (px,py)Let 's use the graphical coordinate system:
[ XYZ 1 ] ↦ [ fx + Z pxfy + Z pyz ] = [ fpx 0 fpx 0 1 0 ] [ XYZ 1 ] ( 3 ) \left. \left[\begin{array}{c}\mathbf{X}\\\mathbf{Y}\\\mathbf{Z}\\\mathbf{1}\end{array}\right.\right]\mapsto \left[\begin{array}{c}f\mathbf{x+Zp_x}\\f\mathbf{y+Zp_y}\\\mathbf{z}\end{array}\right]=\left[\begin {array}{cc}f&&p_x&0\\&f&p_x&0\\&&1&0\end{array}\right]\left[\begin{array}{c}\mathbf{X}\\\mathbf{Y}\\\mathbf{Z }\\\mathbf{1}\end{array}\right]\quad(3) XYZ1 fx+Zpxfy+Zpyz = ffpxpx1000 XYZ1 ( 3 )
DefinitionK
= [ fpxfpx 1 ] ( 4 ) K=\left[\begin{array}{cc}f&&p_x\\&f&p_x\\&&1\end{array}\right]\quad(4)K= ffpxpx1 ( 4 )
thenformula (3) formula (3)Formula ( 3 ) has a concise form:
x = K [ I ∣ 0 ] X cam ( 5 ) x=K[I|0]X_{cam}\quad(5)x=K[I∣0]Xcam( 5 )
MatrixKKK is called the camera calibration matrix, inFormula (5) Formula (5)In formula ( 5 ) we record(X, Y, Z, 1) T (X,Y,Z,1)^T(X,Y,Z,1)T X c a m X_{cam} Xcamis to emphasize that the camera is set at the origin of a Euclidean coordinate system and the main axis is along zzThe direction of the z -axis, while pointX cam X_{cam}XcamExpressed according to this coordinate system. Such a coordinate system can be calledcamera coordinate system

The origin of the camera coordinate system is the camera center , zzThe z- axis direction points tothe main axis

5. Camera rotation and displacement

Generally, 3-dimensional space points are represented by different Euclidean coordinate systems, called world coordinate systems .The camera coordinate system and the world coordinate system are connected through rotation and translation .
Insert image description here

Euclidean transformation between world coordinate system and camera coordinate system

if X ~ \widetilde{X}X is a 3-dimensional non-homogeneous vector, representing the coordinates of a point in the world coordinate system , and X ~ cam \widetilde{X}_{cam}X camis the same point represented by the camera coordinate system , then we can record X ~ cam = R ( X ~ − C ~ ) \widetilde{X}_{cam}=R\left(\widetilde{X}-\widetilde{ C}\right)X cam=R(X C ) , whereC ~ \widetilde{C}C Represents the coordinates of the camera center in the world coordinate system, RRR is a3*3 3*33The rotation matrix of 3 represents the orientation of the camera coordinate system. This equation can be written as:
X cam = [ R − RC ~ 0 T 1 ] [ XYZ 1 ] = [ R − RC ~ 0 T 1 ] X ( 6 ) X_{cam}=\begin{bmatrix }R&-R\widetilde{C}\\0^{T}&1\end{bmatrix}\begin{bmatrix}X\\Y\\Z\\1\end{bmatrix}=\begin{bmatrix}R&- R\widetilde{C}\\0^{T}&1\end{bmatrix}\mathbf{X}\quad(6)Xcam=[R0TRC 1] XYZ1 =[R0TRC 1]X( 6 )
Compare it withformula (5) formula (5)Formulas ( 5 ) are combined to form the formula:
x = KR [ I ∣ − C ~ ] X ( 7 ) x=KR\left[I|-\widetilde{C}\right]X\quad(7)x=KR[IC ]X( 7 )
whereXXX is expressed in the world coordinate system. This is a general mapping given by a pinhole model.

6. Camera internal parameters and external parameters

From formula (7) formula (7)It can be seen from formula ( 7 ) that,General pinhole camera P = KR [ I ∣ − C ~ ] P=KR\left[I|-\widetilde{C}\right]P=KR[IC ] has 9 degrees of freedom: 3 fromK (elements f, px, py) K (elements f, p_x, p_y)K (element f ,px,py) , 3 fromRRR , 3 fromC ~ \widetilde{C}C . Contained in KKThe parameters in K are calledCamera internal parametersorCamera internal calibration. Contained in RRR andC ~ \widetilde{C}C The parameters in are related to the orientation and position of the camera in the world coordinate system, and are calledexternal parametersorexternal calibration

For convenience, usually the camera center is not clearly marked, and the transformation from the world coordinate system to the image coordinate system is expressed as X ~ cam = RX ~ + t \widetilde{X}_{cam}=R\widetilde{X}+ tX cam=RX +t . In this case, the camera matrix is ​​simplified to:
P = k [ R ∣ t ] ( 8 ) P=k[R|t]\quad(8)P=k[Rt]( 8 )
where according toformula (7) formula (7)Formula ( 7 ) ,t = − RC ~ t=-R\widetilde{C}t=RC

7. CCD camera

For the basic pinhole model, the image coordinates are assumed to have equal-scale Euclidean coordinates in both axes. But the pixels of a CCD camera may not be square. If the image coordinates are measured in pixels, then non-equal scaling factors need to be introduced in each direction . Specifically, if in xxxyyThe number of pixels per unit distance of the image coordinates in the y direction are mx m_xmx m y m_y my, then the transformation from world coordinates to pixel coordinates is given by formula (4) formula (4)Formula ( 4 ) is left multiplied by an additional factordiag (mx, my, 1) diag(m_x,m_y,1)diag(mx,my,1 ) and get. Therefore, the general form of a CCD camera calibration matrix is:
K = [ axx 0 ayy 0 1 ] ( 9 ) K=\left[\begin{array}{cc}a_x&&x_0\\&a_y&y_0\\&&1\end{array}\right ]\quad(9)K= axayx0y01 ( 9 )
whereax = fmx a_x=fm_xax=fmxay = fmy a_y=fm_yay=fmyConvert the focal length of the camera to xx respectivelyxyyThe pixel dimension in the y direction. In the same way,x ~ 0 = ( x 0 , y 0 ) T \widetilde{x}_0=(x_0,y_0)^Tx 0=(x0,y0)T is the principal point expressed in pixel dimension, and its coordinates arex 0 = mxpx x_0=m_xp_xx0=mxpxy 0 = mypy y_0=m_yp_yy0=mypy. therefore,A CCD camera has 10 degrees of freedom

Guess you like

Origin blog.csdn.net/m0_37605642/article/details/135158486