World coordinate system, camera coordinate system, pixel coordinate system conversion detailed instructions (with code)

Introduction to several coordinate systems, review of internal and external parameters of the cameraThis article.
This article mainly explains how to convert between several coordinate systems.

This article concerns:

  1. Use camera intrinsics to convert between pixel coordinate system and camera coordinate system.
  2. Use camera extrinsic parameters (pose) to convert between the camera coordinate system and the world coordinate system.
  3. How to use external parameters in the form of (qw, qx, qy, qz, tx, ty, tz).
  4. Take a specific scenario as an example, explain each step in detail, and combine it with the code to further understand each step.

Take the following scenario as an example.
Suppose there is a point p1 on I1 (img1). Now we need to map p1 to the corresponding point p2 on I2 (img2) through the internal and external parameters of camera 1 and camera 2.
You also need to know the depth of p1. Assuming that there is a depth map of img1, the depth at p1 can be read.

Insert image description here

the whole idea:

p1 is on the picture I1, which is the pixel coordinate system. According to camera1'sinternal parameters, it is transferred to the camera coordinate system of camera1, and we get ( xc1, yc1, zc1),
transfer (xc1, yc1, zc1) to the world according to camera1's external parameters coordinate system, get the coordinates of point P in the above picture (xw1, yw1, zw1),
according to the external parameters of camera2Transfer point P (xw1, yw1, zw1) to the camera coordinate system of camera2 to get (xc2, yc2, zc2).
Finally, according to camera2's Internal parameters Convert (xc2, yc2, zc2) to the pixel coordinate system to obtain the coordinates (x2, y2) of the p2 point on the image I2.

The conversion relationship of the entire coordinate system: Pixel 1 -> Camera 1 -> World -> Camera 2 -> Pixel 2

Among them, the pixel coordinate system is 2D, and the others are 3D.
Camera external parameters are also called pose.

Specific steps:

(1). p1 pixel coordinates --> camera 1 coordinates

The relationship between these two coordinate systems is determined by the camera internal parameters.
The camera internal parameters (fx, fy, cx, cy)
Assume that the pixel coordinates are (x1, y1), the coordinates of camera 1 are (xc1, yc1, zc1), where zc1 is the value at (xc1, yc1) of the depth map of I1, then

x 1 = f x x c 1 z c 1 + c x x_{1} = f_{x}\frac{x_{c1}}{z_{c1}} + c_{x} x1=fxWithc1xc1+cx,   y 1 = f y y c 1 z c 1 + c y y_{1} = f_{y}\frac{y_{c1}}{z_{c1}} + c_{y} and1=fyWithc1andc1+cy  (1)

Now require xc1 and yc1, obtained from (1)
x c 1 = ( x 1 − c x ) ∗ z c 1 / f x x_{c1} = (x_{1}- c_{x}) * z_{c1} / f_{x} xc1=(x1cx)Withc1/fx    y c 1 = ( y 1 − c y ) ∗ z c 1 / f y y_{c1} = (y_{1}- c_{y}) * z_{c1} / f_{y} andc1=(y1cy)Withc1/fy

Code:

depth1_ori = cv2.imread("depth1.png", -1)  #uint16型
depth1 = cv2.split(depth1_ori)[0]
#p1点对应的相机坐标
zc1 = depth1[y1, x1] / 1000.0  #这里深度单位是mm
xc1 = (x1 - cx) * zc1 / fx
yc1 = (y1 - cy) * zc1 / fy

(2). Camera 1 coordinates of p1 --> World coordinates

Conversion relationship: Camera coordinates = T * World coordinates, World coordinates = T-1 * Camera coordinates
Where T is the transformation matrix of world -> camera.

How to find the transformation matrix T? Start with a conceptual introduction,

Rotation matrix R: 3 * 3 matrix
Translation vector t: 3 * 1 matrix
Put R and t into transformation matrix T: 4 * 4 matrix,

T = [ R t 0 T 1 ] T = \begin{bmatrix} R & t\\ 0^{T}&1 \end{bmatrix} T=[R0Tt1]

By the way, in the Lie group Lie algebra, T is SE(3) and R is SO(3).

Let’s return to the topic of coordinates. (xc1, yc1, zc1) are camera 1 coordinates, (xw, yw, zw) are world coordinates, then the world coordinates to camera coordinates are:

[ x c 1 y c 1 z c 1 1 ] = T ⋅ [ x w y w z w 1 ] \begin{bmatrix} x_{c1} \\ y_{c1}\\ z_{c1}\\ 1 \end{bmatrix} = T \cdot \begin{bmatrix} x_{w} \\ y_{w}\\ z_{w}\\ 1 \end{bmatrix} xc1andc1Withc11 =T xwandwWithw1

You must be curious, why do we need to add another dimension?
if T T T without adding the last line [ 0 T 1 ] \begin{bmatrix} 0^{T}&1 \end{ bmatrix} [0T1], the coordinates do not add 1 in the last dimension, directly T = [ R t ] T = \begin{bmatrix} R & t \end{bmatrix} T=[Rt] can also be calculated, why must we add one dimension?

[ x c y c z c ] = T ⋅ [ x w y w z w ] \begin{bmatrix} x_{c} \\ y_{c}\\ z_{c} \end{bmatrix} = T \cdot \begin{bmatrix} x_{w} \\ y_{w}\\ z_{w} \end{bmatrix} xcandcWithc =T xwandwWithw , 这里 T = [ R t ] T = \begin{bmatrix}R & t\end{bmatrix} T=[Rt]

is like this. Now we are converting from world coordinates to camera 1 coordinates. What if we want to convert camera 1 coordinates to world coordinates?
(What we have to do now is to convert the camera 1 coordinate of p1 to the world coordinate.)

Then you need to calculate it like this,

[ x w y w z w ] = T − 1 ⋅ [ x c y c z c ] \begin{bmatrix} x_{w} \\ y_{w}\\ z_{w} \end{bmatrix} = T^{-1}\cdot \begin{bmatrix} x_{c} \\ y_{c}\\ z_{c} \end{bmatrix} xwandwWithw =T1 xcandcWithc ,这里 T = [ R t ] T = \begin{bmatrix}R & t\end{bmatrix} T=[Rt], the inverse matrix cannot be found

Find the inverse matrix of T. T must be square (number of rows = number of columns), it cannot be 3 * 4, it must be 4 * 4.

So add one row to make a 4 * 4 matrix

T = [ R t 0 T 1 ] T = \begin{bmatrix} R & t\\ 0^{T}&1 \end{bmatrix} T=[R0Tt1]

Then the camera coordinates --> world coordinates become:

[ x w y w z w 1 ] = T − 1 ⋅ [ x c y c z c 1 ] \begin{bmatrix} x_{w} \\ y_{w}\\ z_{w}\\ 1 \end{bmatrix} = T^{-1} \cdot \begin{bmatrix} x_{c} \\ y_{c}\\ z_{c}\\ 1 \end{bmatrix} xwandwWithw1 =T1 xcandcWithc1

Some programs will use names like Twc and Tcw, where w refers to world, which is the world coordinate, and c refers to camera, which is the camera coordinate.
T represents the transformation matrix. As for whether Twc is world to camera or camera to world, it needs to be determined according to the actual situation (each developer has different habits).

In practice, I still don’t know how to calculate T at this point. What’s the problem?

The camera external parameters we get are generally in the form of a quaternion + translation vector, and there is no R matrix in it.
Camera external parameters: (qw, qx, qy, qz, tx, ty, tz), (this order depends on the actual situation, some camera order is not like this).

The quaternion q = (qw, qx, qy, qz) is used here instead of the R matrix,
The reason is that R is a 3 * 3 matrix with 9 quantities, and There are only 3 degrees of freedom in a rotation. This expression is redundant, and the quaternion expression is more compact.

The above are the related concepts involved. Now let’s start calculating T.

Compute the transformation matrix T

Now we need to convert q to R first, and then get T from R, t.
q ​​= (qw, qx, qy, qz), (must be in the order of qw, qx, qy, qz, if not, adjust to this order first)
t = (tx, ty, tz), pay attention to the unit of t here. If it is mm, it needs / 1000.0.

If you use the Eigen library, you can get T like this,
Isometry3d is a 4 * 4 Euclidean transformation matrix, which is the format of T (Reference a>)

Eigen::Quaterniond q(qw, qx, qy, qz);
Eigen::Isometry3d T(q);
//先设置的旋转矩阵,下面平移要在旋转前的坐标系上平移,所以是pretranslate
T.pretranslate(Eigen::Vector3d(tx, ty, tz));

If using Sophus::SE3d

SE3d T = SE3d(Quaterniond(qw, qx, qy, qz),
                     Vector3d(tx, ty, tz))
                );

If calculated directly, the formula from quaternion q to rotation matrix R is (transferthe picture here):
Here q0, q1, q2, q3 correspond to qw, qx, qy, qz respectively.

Insert image description here

Combine (tx, ty, tz), add a line below [ 0 T 1 ] \begin{bmatrix} 0^{T}&1 \end{bmatrix} [0T1], get T1 (obtained from the external parameters of camera 1).

T1 = np.array(
    [
        [
            1 - 2 * q2 ** 2 - 2 * q3 ** 2,
            2 * q1 * q2 - 2 * q0 * q3,
            2 * q1 * q3 + 2 * q0 * q2,
            tx,  #注意单位,如果是mm,要/1000.0
        ],
        [
            2 * q1 * q2 + 2 * q0 * q3,
            1 - 2 * q1 ** 2 - 2 * q3 ** 2,
            2 * q2 * q3 - 2 * q0 * q1,
            ty,  #注意单位,如果是mm,要/1000.0
        ],
        [
            2 * q1 * q3 - 2 * q0 * q2,
            2 * q2 * q3 + 2 * q0 * q1,
            1 - 2 * q1 ** 2 - 2 * q2 ** 2,
            tz,  #注意单位,如果是mm,要/1000.0
        ],
        [0,0,0,1],
    ])

T1 has been obtained, now you can convertcamera coordinates to world coordinates

[ x w y w z w 1 ] = T 1 − 1 ⋅ [ x c 1 y c 1 z c 1 1 ] \begin{bmatrix} x_{w} \\ y_{w}\\ z_{w}\\ 1 \end{bmatrix} = T_{1}^{-1} \cdot \begin{bmatrix} x_{c1} \\ y_{c1}\\ z_{c1}\\ 1 \end{bmatrix} xwandwWithw1 =T11 xc1andc1Withc11

Code:

p1_c = np.array([xc1, yc1, zc1, 1])
p_w = np.matmul(np.linalg.inv(T1), np.expand_dims(p1_c,1))

(3). World coordinates --> Camera 2 coordinates

The above has explained how to convert from world coordinates to camera coordinates.
Note that the T1 obtained above is obtained from the external parameters of camera 1.
The external parameters of camera 2 are used here, camera2: (qw2, qx2, qy2 , qz2, tx2, ty2, tz2),
After obtaining T2, the camera 2 coordinate of P can be obtained from the following formula

[ x c 2 y c 2 z c 2 1 ] = T 2 ⋅ [ x w y w z w 1 ] \begin{bmatrix} x_{c2} \\ y_{c2}\\ z_{c2}\\ 1 \end{bmatrix} = T_{2} \cdot \begin{bmatrix} x_{w} \\ y_{w}\\ z_{w}\\ 1 \end{bmatrix} xc2andc2Withc21 =T2 xwandwWithw1

p2_c = np.matmul(T2, p_w)

(4) Camera 2 coordinates --> Pixel coordinates 2

Camera internal parameters (fx, fy, cx, cy)

x 2 = f x x c 2 z c 2 + c x x_{2} = f_{x}\frac{x_{c2}}{z_{c2}} + c_{x} x2=fxWithc2xc2+cx,   y 2 = f y y c 2 z c 2 + c y y_{2} = f_{y}\frac{y_{c2}}{z_{c2}} + c_{y} and2=fyWithc2andc2+cy

xc2 = p2_c[0]
yc2 = p2_c[1]
zc2 = p2_c[2]
x2 = xc2 * fx / zc2 + cx
y2 = yc2 * fy / zc2 + cy

In this way, the coordinates of the mapping point p2 on the image I2 are obtained.

Guess you like

Origin blog.csdn.net/level_code/article/details/134437269