Machine Vision Model - Projection Matrix

1 Overview

Machine vision is to use machines instead of human eyes and brains for measurement and judgment. The basic process of the machine vision system is to obtain the image of the target, perform analysis operations such as recognition, feature extraction, classification, and mathematical operations on the image, and control or make decisions on the corresponding system according to the analysis and calculation results of the image.
In many machine vision applications, machine vision measurement is required, that is, to obtain the physical position of the target in the actual space according to the image of the target, such as grabbing manipulators, walking robots, SLAM, etc.
To obtain the physical space position of the target according to the target pixel position in the image, we need to first have a mapping relationship between the image pixel coordinates and the physical space coordinates, that is, abstract the optical imaging process into a mathematical formula, which can express the spatial position The mathematical formula of how to map to the pixel position of the image is the so-called machine vision imaging model. This article discusses the mechanism of this model.

2 small hole imaging

Machine vision imaging adopts a small hole imaging model, as shown in the following figure
Figure 1:

insert image description here
Simplified again to the following figure
Figure 2: XX
insert image description here
in the figureX is a point in space,xxx is the imaging point of the space point in the image,CCC is the optical center of the lens (camera centre), as can be seen from the figure,CCC x x x X X The three points of X are collinear.
Optical centerCCThe distance from C to the image plane is the focal lengthfff .
The following coordinate systems and their interrelationships are derived based on this pinhole imaging model.

3 coordinate system

When it comes to machine vision measurement models, it is necessary to first understand several coordinate systems involved in the entire model.

3.1 Pixel coordinate system uov

That is, the coordinate system where each pixel point in the image sits, as shown in the figure below uov.
Figure 3:
insert image description here
This coordinate system is a two-dimensional coordinate system, the abscissa is the image width direction, the ordinate is the image height direction, the origin is in the upper left corner, and the coordinate axis unit is pixel, which corresponds to the pixel point of the image.

3.2 Image coordinate system xoy

That is, the image sensor (such as CMOS, CCD) coordinate system, as shown in the figure below xoy.
Figure 4:
insert image description here
This coordinate system is also a two-dimensional coordinate system, the abscissa is the width direction of the sensor, the ordinate is the height direction of the sensor, the origin is at the center of the sensor, and the unit of the coordinate axis is mm (set according to actual needs, m, mm, ... …), and the following coordinate systems are also in the same unit, which will not be explained here.
Combined with the pixel coordinate system, we can get the following figure Figure
5:
insert image description here
From this figure, we can get the mapping relationship between the pixel coordinate system uov and the image coordinate system xoy, namely:
u = x / dx + u 0 v = − y / dy + v 0 u=x/dx+u_0\\ v=-y/dy+v0u=x/dx+u0v=y / d y+v 0
in formula:
u 0 u_0u0v 0 v_0v0——The pixel coordinates of the center of the image (usually half of the horizontal and vertical resolution of the image, but if the position of the lens and the sensor is misaligned, it is not half), the unit is pixel; dx
dxdx d y dy d y ——The horizontal and vertical dimensions of the sensor unit (that is, the pixel size), in mm/pixel, usually the pixel is a square, sodx = dy dx=dydx=d y
The above formula is written in the form of a homogeneous matrix, which isthe conversion relationship between the pixel coordinate system and the image coordinate system
[ uv 1 ] = [ 1 / dx 0 u 0 0 − 1 / dyv 0 0 0 1 ] [ xy 1 ] \left[ \begin{matrix} u\\v\\1 \end{matrix}\right]= \left[\begin{matrix} 1/dx&0&u_0\\0&-1/dy&v_0\\0&0&1 \end{matrix}\right] \left[\begin{matrix} x\\y\\1 \end{matrix}\right]uv1=1/dx0001/dy0u0v01xy1
Note that a negative sign is added to the y direction in the above formula, because the pixel coordinate system uov is a left-handed coordinate system, but the three-dimensional coordinate system to be discussed later uses a right-handed coordinate system, so here the image coordinate system xoy is directly set to the right-handed coordinate system The negative sign is used to transform the y-axis direction.

3.3 Camera coordinate system OCXCYCZC O_CX_CY_CZ_COCXCYCZC

Set a three-dimensional coordinate system on the camera lens, as shown in the figure below, the origin is at the optical center, the X-axis and Y-axis are parallel to the x-axis and y-axis of the image coordinate system respectively, and the Z-axis points to the object side (Note: This is what I am used to One camera coordinate system definition, another more common one is that Z points to the object space, and XY points to the opposite of what I have in the picture. The advantage is that there are no two negative signs of my internal reference in the internal reference).
Figure 6:
figure 2
According to the above small hole imaging model, we can get the projection relationship in the YOZ (YCZ) plane, as shown in the figure below (XOZ plane is the same)
Figure 7:
insert image description here
In the above figure, according to similar triangles, f ZC = y YC \ frac{f}{Z_C}=\frac{y}{Y_C}ZCf=YCy以及 f Z C = x X C \frac{f}{Z_C}=\frac{x}{X_C} ZCf=XCx, so we can write the conversion relationship between the camera coordinate system and the image coordinate system , we directly write it as the homogeneous coordinate form
[ xy 1 ] = [ f / ZC 0 0 0 0 f / ZC 0 0 0 0 1 / ZC 0 ] [ XCYCZC 1 ] = 1 ZC [ f 0 0 0 0 f 0 0 0 0 1 0 ] [ XCYCZC 1 ] \left[\begin{matrix} x\\y\\1 \end{matrix}\right]= \left [\begin{matrix} f/Z_C&0&0&0\\0&f/Z_C&0&0\\0&0&1/Z_C&0 \end{matrix}\right] \left[\begin{matrix} X_C\\Y_C\\Z_C\\1 \end{matrix} \right]=\frac{1}{Z_C}\left[\begin{matrix} f&0&0&0\\0&f&0&0\\0&0&1&0 \end{matrix}\right] \left[\begin{matrix} X_C\\Y_C\\Z_C \\1 \end{matrix}\right]xy1=f/ZC000f/ZC0001/ZC000XCYCZC1=ZC1f000f0001000XCYCZC1
Where: fff - the focal length of the lens, some formulas in the literature will divide the focal length into fx f_xin the X and Y directionsfx f y f_y fy
Determine the equivalent of the equivalent equation of the equation
[ uv 1 ] = [ 1 / dx 0 u 0 0 − 1 / dyv 0 0 0 1 ] [ xy 1 ] = 1 ZC [ 1 / dx 0 u 0 0 − 1 / dyv 0 0 0 1 ] [ f 0 0 0 0 f 0 0 0 0 1 0 ] [ XCYCZC 1 ] \left[\begin{matrix} u\\v\\1 \end{matrix}\right] = \left[\begin{matrix} 1/dx&0&u_0\\0&-1/dy&v_0\\0&0&1 \end{matrix}\right] \left[\begin{matrix} x\\y\\1 \end{matrix}; \right]=\frac{1}{Z_C} \left[\begin{matrix} 1/dx&0&u_0\\0&-1/dy&v_0\\0&0&1 \end{matrix}\right] \left[\begin{matrix} f&0&0&0 \\0&f&0&0\\0&0&1&0 \end{matrix}\right] \left[\begin{matrix} X_C\\Y_C\\Z_C\\1 \end{matrix}\right]uv1=1/dx0001/dy0u0v01xy1=ZC11/dx0001/dy0u0v01f000f0001000XCYCZC1
We can get the conversion relationship between the camera coordinate system and the pixel coordinate system as follows
[ uv 1 ] = 1 ZC [ f / dx 0 u 0 0 0 − f / dyv 0 0 0 0 1 0 ] [ XCYCZC 1 ] \left[\begin {matrix} u\\v\\1 \end{matrix}\right]=\frac{1}{Z_C} \left[\begin{matrix} f/dx&0&u_0&0\\0&-f/dy&v_0&0\\0&0&1&0 \end {matrix}\right] \left[\begin{matrix} X_C\\Y_C\\Z_C\\1 \end{matrix}\right]uv1=ZC1f/dx000f / d y0u0v01000XCYCZC1
We use M 1 M_1M1Represents the matrix in the formula. At the same time, we can see from Figure 6 above that the image is actually observed from the back of the image sensor in the figure. Therefore, a negative sign needs to be added to the X direction, and M 1 = [ − f / dx 0
u 0 0 0 − f / dyv 0 0 0 0 1 0 ] M_1=\left[\begin{matrix} -f/dx&0&u_0&0\\0&-f/dy&v_0&0\\0&0&1&0 \end{matrix}\right]M1=f/dx000f / d y0u0v01000
The parameters in this matrix are only related to the lens focal length fff , pixel sizedxdy dxdyd x d y , center pixelu 0 v 0 u_0v_0u0v0Related, these are the internal parameters of the camera and lens. After the camera and lens are determined, the matrix is ​​determined, so it is called the internal reference matrix .

3.4 OWXWYWZW O_WX_WY_WZ_WOWXWYWZW

The world coordinate system is the absolute coordinate system of the system, and it is also a three-dimensional coordinate system. The origin and coordinate axis directions are selected according to our needs.
Figure 8:
insert image description here
As a rigid body, the camera has a pose in the world coordinate system - position and pose, the position is the translation of the camera (the origin of the camera coordinate system) relative to the origin of the world coordinate system, using a 3×1 translation vector TC T_CTCExpression, attitude is the rotation of the camera (camera coordinate system) relative to the world coordinate system, using a 3×3 rotation matrix RC R_CRCExpression
Then we can get the relationship between the camera coordinate system and the world coordinate system
[ XWYWZW 1 ] = [ RCTC 0 1 × 3 1 ] [ XCYCZC 1 ] \left[\begin{matrix} X_W\\Y_W\\Z_W\\1 \end{matrix}\right]= \left[\begin{matrix} R_C&T_C\\0_{1×3}&1 \end{matrix}\right] \left[\begin{matrix} X_C\\Y_C\\Z_C \\1 \end{matrix}\right]XWYWZW1=[RC01×3TC1]XCYCZC1
反过来
[ X C Y C Z C 1 ] = [ R C T C 0 1 × 3 1 ] − 1 [ X W Y W Z W 1 ] = M 2 [ X W Y W Z W 1 ] \left[\begin{matrix} X_C\\Y_C\\Z_C\\1 \end{matrix}\right]= \left[\begin{matrix} R_C&T_C\\0_{1×3}&1 \end{matrix}\right]^{-1} \left[\begin{matrix} X_W\\Y_W\\Z_W\\1 \end{matrix}\right]=M_2\left[\begin{matrix} X_W\\Y_W\\Z_W\\1 \end{matrix}\right] XCYCZC1=[RC01×3TC1]1XWYWZW1=M2XWYWZW1
This is the transformation relationship between the world coordinate system and the camera coordinate system , where the matrix M 2 M_2M2It is related to the pose of the camera and is called the extrinsic parameter matrix .
Substituting the previous conversion formula between the pixel coordinate system and the camera coordinate system, we get
[ uv 1 ] = 1 ZC [ − f / dx 0 u 0 0 0 − f / dyv 0 0 0 0 1 0 ] [ XCYCZC 1 ] = 1 ZC [ − f / dx 0 u 0 0 0 − f / dyv 0 0 0 0 1 0 ] M 2 [ XWYWZW 1 ] \left[\begin{matrix} u\\v\\1 \end{matrix}\right] =\frac{1}{Z_C} \left[\begin{matrix} -f/dx&0&u_0&0\\0&-f/dy&v_0&0\\0&0&1&0 \end{matrix}\right] \left[\begin{matrix} X_C\\ Y_C\\Z_C\\1 \end{matrix}\right]=\frac{1}{Z_C} \left[\begin{matrix} -f/dx&0&u_0&0\\0&-f/dy&v_0&0\\0&0&1&0 \end{matrix }\right]M_2 \left[\begin{matrix} X_W\\Y_W\\Z_W\\1 \end{matrix}\right]uv1=ZC1f/dx000f / d y0u0v01000XCYCZC1=ZC1f/dx000f / d y0u0v01000M2XWYWZW1

4 Machine Vision Projection Matrix

So far, we have obtained the mapping relationship between the pixel coordinate system and the world coordinate system , that is, the machine vision projection matrix
[ uv 1 ] = 1 ZCM 1 M 2 [ XWYWZW 1 ] \left[\begin{matrix} u\\v\\ 1 \end{matrix}\right]=\frac{1}{Z_C} M_1M_2 \left[\begin{matrix} X_W\\Y_W\\Z_W\\1 \end{matrix}\right]uv1=ZC1M1M2XWYWZW1
Where:
ZC Z_CZC——Z coordinate M 1 M_1 of the space point in the camera coordinate system
M1——Horizontal matrix,3×4 matrix, M 1 = [ − f / dx 0 u 0 0 0 − f / div 0 0 0 0 1 0 ] M_1=\left[\begin{matrix} -f/dx&0&u_0&0\\; 0&-f/dy&v_0&0\\0&0&1&0 \end{matrix}\right]M1=f/dx000f / d y0u0v01000
M 2 M_2 M2——Specialized square,4×4 square, M 2 = [ r 11 r 12 r 13 txr 21 r 22 r 23 tyr 31 r 32 r 33 tz 0 0 0 1 ] M_2=\left[\begin{matrix} r_]. {11}&r_{12}&r_{13}&t_x\\ r_{21}&r_{22}&r_{23}&t_y\\ r_{31}&r_{32}&r_{33}&t_z\\ 0&0&0&1 \end{matrix} \right]M2=r11r21r310r12r22r320r13r23r330txtytz1
After the model is established, how to obtain the parameters in it? This involves the next question: "Machine Vision-Camera Calibration"

Guess you like

Origin blog.csdn.net/hangl_ciom/article/details/106082794