Camera imaging --- conversion relationship between world coordinate system, camera coordinate system, image coordinate system and pixel coordinate system

       I started writing this article on the first day of the new semester, let's see when I can send it out. Of course, I didn’t do anything during the holidays. Before I swore that I would be able to study and learn how to knit when I returned home. I started to study. After configuring the environment at the end of last year, the experiment found that he was not accurate. The D435i camera used is infrared, and the error in the water is too large, so the final goal is to adjust it for him. What is the brain doing every day? I don’t remember, I really want to test a memory to see if there is a problem... First of all, let’s learn the imaging principle of the basic camera~

The depth measurement process of the binocular stereo vision depth camera is as follows:

1. Calibration: First, it is necessary to calibrate the binocular camera to obtain the internal and external parameters and homography matrix of the two cameras.

2) Correction: The original image is corrected according to the calibration result, and the two corrected images are located on the same plane and parallel to each other.

3) Find pixel points: perform pixel point matching on the corrected two images.

4) Depth: Calculate the depth of each pixel according to the matching result to obtain a depth map.

Table of contents

1. Ideal binocular camera imaging model

2. Camera imaging principle

2.1 Conversion between the world coordinate system and the camera coordinate system (rigid body transformation: the object will not be deformed, only rotation and translation are required)

2.2 Conversion between camera coordinate system (Oc) and image coordinate system (Ox-y) (perspective projection: conversion from 3D to 2D)

2.3 Image coordinate system (Ox-y) and pixel coordinate system (Ou-v) (affine transformation)

Third, summarize


1. Ideal binocular camera imaging model

c1 and c2 are the left and right binocular cameras (aligned). The focal length of the camera is f, the distance between the cameras is b, the upper right corner is the target position P(x,y), the horizontal coordinate of the target is x, and the vertical distance between the camera and the target (the depth of the target from the camera) is z .

If you want to calculate the depth z, you must know:

(1) Camera focal length f, distance b between left and right cameras. These parameters can be obtained from prior information or camera calibration.

(2) Parallax d. It is necessary to know the correspondence between each pixel (xl, yl) of the left camera and the corresponding point (xr, yr) of the right camera. This is the core problem of binocular vision.

2. Camera imaging principle

If you want to calibrate, you have to image first, so let's first figure out how the camera is imaged.

In the camera imaging system, there are four coordinate systems: the world coordinate system, the camera coordinate system, the image coordinate system, and the pixel coordinate system.

World coordinate system: The three-dimensional world coordinate system introduced to describe the position of the target object in the real world.

Camera coordinate system: with the camera as the center, describe the position of the object from the perspective of the camera, as a bridge between the pixel coordinate system and the world coordinate system.

Image coordinate system: describes the coordinate system of the real object imaged at the focal length of the camera, and is used to connect the camera coordinate system and the pixel coordinate system.

Pixel coordinate system: A digital coordinate system introduced to describe the position of an object in a digital image on a photograph.

 From one coordinate system to another, the coordinate system transformation between objects can represent the rotation transformation of the coordinate system plus the translation transformation.

 Why write it like above: Take the rotation around the Z axis \thetaas an example

 As shown in the figure below, turn around Z, and Z remains unchanged. We convert the three-dimensional into two-dimensional, and we can get it according to the geometric relationship, and the other two are the same.

After the principle is known, let's see how these four coordinate systems are transformed, and the objects in reality are finally imaged in the image

World coordinate system \rightarrowCamera coordinate system \rightarrowImage coordinate system \rightarrowPixel coordinate system

2.1 Conversion between the world coordinate system and the camera coordinate system (rigid body transformation: the object will not be deformed, only rotation and translation are required)

As shown in the figure below, R represents the rotation matrix, and T represents the offset variable

2.2 Conversion between camera coordinate system (Oc) and image coordinate system (Ox-y) (perspective projection: conversion from 3D to 2D)

         To put it bluntly, use X{c}, Y{c}, Z{c}to represent x, y. The picture below shows the ideal image coordinate system, which will actually cause distortion and needs to be corrected.

2.3 Image coordinate system (Ox-y) and pixel coordinate system (Ou-v) (affine transformation)

Both the pixel coordinate system and the image coordinate system are on the imaging plane, but the positions of the origin are different and the units are different. The origin of the image coordinate system is the intersection of the camera optical axis and the imaging plane (point o in the above figure), which is usually the midpoint of the imaging plane. The unit of the image coordinate system is mm, and the origin of the pixel coordinate system is at the upper left corner of the image ( There is a picture on the third picture in this article), and the unit is pixel.

2.3.1 Both coordinate axes are Cartesian coordinate system (uv vertical)

The coordinates (u, v) of each point on the image represent the number of columns and rows of the array stored in the system for each frame of the captured image, and the value corresponding to the coordinates (u, v) is the grayscale information of the point . Let point o be recorded as (u0,v0) in the image pixel coordinate system, the actual physical size of each pixel along the x-axis is dx, the actual physical size along the y-axis is dy, and the unit value is mm, that is, 1pixel=dx mm. Then the relationship between the two coordinate systems can be obtained.

2.3.2 One of the two coordinate axes is parallel and the other is not

Third, summarize

When the three are worn, it becomes:

Among them, Xw, Yw, Zw are the physical coordinates of a point in the world coordinate system, u, v are the pixel coordinates corresponding to the point in the pixel coordinate system, and Zc is the scale factor. The first one of the rightmost equation is the internal parameter of the camera, and the internal parameter matrix depends on the internal parameters of the camera. Among them, f is the image distance, dx, dy respectively represent the physical length of a pixel on the photosensitive plate of the camera in the X and Y directions (that is, how many millimeters is a pixel on the photosensitive plate), u0, v0 respectively represent the center of the photosensitive plate of the camera The coordinates in the pixel coordinate system \thetarepresent the angle between the horizontal and vertical sides of the photosensitive plate (90 degrees means no error). The second is the external parameter of the camera. The external parameter matrix depends on the relative position of the camera coordinate system and the world coordinate system. R represents the rotation matrix, and T represents the translation vector.

In the next article, we will take a look at what calibration is.

 

 

Guess you like

Origin blog.csdn.net/Zosse/article/details/123197970