# I. Introduction

The vision system has four coordinate systems: pixel plane coordinate system (u, v), image coordinate system (x, y), camera coordinate system (Xc, Yc, Zc) and world coordinate system (Xw, Yw, Zw), As shown below. There are connections between each coordinate system, so how to locate the coordinates of the world coordinate system through the image pixel coordinates needs to be solved by camera calibration. The key algorithm part is the coordinate system conversion, and the transformation needs to pass the homogeneous coordinate Express the way to complete.

# Two, coordinate system transformation

### 2.1 Conversion of pixel coordinate system and image coordinate system

Pixel coordinates are the position of the pixel in the image. The pixel coordinate system is established in the image and needs to be transformed with the camera coordinate system, and its unit is pixel. The vertex of the upper left corner is the origin (Op), the horizontal to the right is u, and the vertical downward is the v axis.

The unit of the image coordinate system is millimeters. This is because the pixel coordinates cannot reflect the specific size of the point in the picture, so the image coordinate system is needed to represent it. The origin is Oi, which is the intersection of the optical axis and the imaging plane.

Both coordinate systems are on the imaging plane, but their origins and measurement units are different. Two parameters (dx, dy) need to be known during conversion, which respectively represent the actual size of the pixels on the photosensitive chip.

The conversion between the two requires a unit conversion. In the figure, assuming that the pixel coordinates of the image center are (u0, v0), then the relationship between the coordinates (x, y) of the image coordinate system and the coordinates (u, v) of the pixel coordinate system can be expressed as:

$u=x/dx+u0$

$v=y/dy+$

Convert$v$$0$ to homogeneous coordinates:

$⎣⎡ uv1 ⎦⎤ $=$⎣⎡ 1/dx00 01/dy0 u0v01 ⎦⎤ $*$⎣⎡ xY1 ⎦⎤ $

### 2.2 Conversion between camera coordinate system and image coordinate system

The camera coordinate system takes the optical axis of the camera as the Z axis. The center position of the light in the camera optical system is the origin Oc (actually the center of the lens). The horizontal axis Xc and vertical axis Yc of the camera coordinate system are respectively in the image coordinate system. The X axis and Y axis are parallel, and the distance between OcOi is f.

Point P in the above figure is the coordinates of the image point, and B is the coordinates of the object in the camera coordinate system, which can be solved by similar triangles, as shown in the figure above, and finally expressed in homogeneous coordinates. The last expression shows that the camera coordinate system is three-dimensional, and the image coordinate system is two-dimensional, which can only be upgraded in the form of homogeneous coordinate transformation.

So why is the pixel coordinate system in front of the camera coordinate system?

From the schematic diagram , we can see that the image plane is on the right side of the mirror, and when deriving the coordinate system relationship of the camera calibration, we think that the light passes through the imaging plane first, and then converges to a point on the camera coordinate system. The reason is that in the derivation, the image plane is replaced with a virtual image plane.

## 2.3 Conversion between world coordinate system and camera coordinate system

The world coordinate system is the reference system for the position of the target object. It can be placed freely according to the convenience of the calculation. The unit is the length unit such as mm. The conversion from the world coordinate system to the camera coordinate system involves rotation and translation (in fact, all movements can also be described by rotation matrix and translation vector). Because the world coordinate system and camera coordinates are both right-handed coordinate systems, they will not be deformed (rigid transformation).

Calculation process:

translation, camera coordinate point (Xc, Yc, Zc) translation distance (tx, ty, tz) to world coordinate point (Xw, Yw, Zw):

$⎣⎢⎢⎡ XwThewFromw1 ⎦⎥⎥⎤ $=$⎣⎢⎢⎡ 1000 0100 0010 txtytz1 ⎦⎥⎥⎤ $*$⎣⎢⎢⎡ XcAndcZc1 ⎦⎥⎥⎤ $

The coordinate points (Xc, Yc, Zc) rotate a certain angle to the world coordinate points (Xw, Yw, Zw) during the translation process:

First, a basic rotation matrix and basic matrix are given:

R= $[cosθ−sinθ sinθ−cosθ ]$

基本矩阵：

$⎣⎢⎢⎡ 1000 0100 0010 0001 ⎦⎥⎥⎤ $

If the coordinate points (Xc, Yc, Zc) are rotated by α, β, and γ degrees around the X, Y, and Z axes respectively, the final expression is (haha, I really don’t want to derive the affine transformation formula by myself, just give the final result , The link can be seen below):

Then the translation transformation can be combined as:

## 2.4 Pixel coordinate system to world coordinate system transformation (ultimate transformation)

The red box is the external parameter, R and T are the rotation and translation respectively.

The internal parameter is the inherent property of the camera, which is actually the focal length f, the pixel size dx, dy.

Zc is obvious, it means the distance of the point from the optical axis.

Reference article:

How to use camera calibration and calibration results with

3d transformation Basics: Detailed explanation of translation, rotation, zoom (affine transformation)-formula derivation

Computer vision: Principles of camera imaging: world coordinate system, camera coordinate system, image Conversion between coordinate system and pixel coordinate system