Digital image processing---Intrinsic and external parameters of the camera (CV study notes)

Pinhole Camera Model

        A pinhole camera is a simple camera with no lens and only a small aperture. Light passes through the aperture and creates an inverted image on the other side of the camera. For the convenience of modeling, we can move the image on the physical imaging plane (image plane) between the actual scene (3D object) and the focal point (focal point), and imagine it as a virtual image plane of the same size as the physical imaging plane (Virtual image plane), so that it is no longer an inverted image, but an upright image.

 

        With the camera, the blue box in the picture above becomes a camera, and the physical imaging plane Image plane in the picture above is also digitized into a sensor composed of pixels and saved. Therefore, for the camera, the focus in the above picture is the camera lens, and the physical imaging plane in the above picture needs to be converted into a pixel plane (pixel plane). The physical imaging plane (image plane) and the pixel plane (pixel plane) ) are the same size but have different measurement units. The unit of the physical imaging plane is a physical unit, such as mm, while the pixel plane is actually a two-dimensional image, and its unit is actually the row and column of a certain pixel in the image.

For the convenience of subsequent description, we first define four coordinate systems:

1. The two-dimensional image plane (focal plane) coordinate system Image plane, the origin is , O_{i}and the coordinate axis is represented by .x_{i}y_{i}

2. The origin of the two-dimensional image coordinate system pixel plane is O_{p}and the coordinate axis u_{p}is v_{p}represented by .

3. The three-dimensional camera coordinate system pinhole plane/camera, the origin is O_{c}, and the coordinate axis is represented by x_{c}, y_{c}, z_{c}.

4. The origin of the three-dimensional world coordinate system world is , O_{w}and the coordinate axes are represented by x_{w}, y_{w}.z_{w}

        There are two steps in mapping the 3D world scene into a 2D image (pixel plane). The first step is to map the actual 3D object defined in the world coordinate system to the 3D camera polar coordinate system . It is equivalent to representing objects in the real world through two different coordinate systems, and then establishing the connection between the two coordinate systems by finding the difference between the two different coordinate systems. This conversion relationship is the conversion O_{w}shown in the figure below O_{c}.

        From 3D world coordinates to 3D camera coordinates, you need to use extrinsic parameters or extrinsic matrix --->[R t] .

        Secondly, from 3D camera coordinates to 2D pixel plane, intrinsic parameters or intrinsic matrix --->K are required . The imaged image is also represented by two different coordinate systems, and then the connection between the two coordinate systems (physical imaging coordinate system and two-dimensional image coordinate system) is established so that the two can be converted to each other.


extrinsic parameters: world coordinate system to camera coordinate system

        For a certain point M in the world coordinate system , it itself exists and will not be affected by whether we have established a coordinate system. But when we artificially establish a coordinate system, this point will have coordinate values ​​in the coordinate system we defined. First of all, for point M, it can be expressed as M=[ x_{w}^{M},y_{w}^{M},z_{w}^{M}] in the world coordinate system, and M=[ ] in the camera coordinate system x_{c}^{M},y_{c}^{M},z_{c}^{M}. This is the same point, but the coordinates corresponding to different coordinate systems are The values ​​are different. (Among them: x_{w}^{M}the superscript "M" in represents the point M, the subscript "w" represents the world coordinate system worl, and so on. The definition of the subscript can refer to the four coordinate systems I defined above.)

         Compared with the world coordinate system, the camera coordinate system cannot guarantee that the origins of the two coordinate systems completely coincide. Therefore, there is a certain displacement for xyz, which is represented by a 3x1 matrix t (translation), in which each element corresponds to Displacement in xyz direction:

t=\begin{bmatrix} t_{x}\\ t_{y}\\ t_{z} \end{bmatrix}

In addition, we cannot guarantee that the camera will not have any angular deviation when taking pictures. Therefore, there is an overall rotation of the coordinate axes of these two coordinate systems. Represented by a 3x3 matrix R(rotation):

R=\begin{bmatrix} r_{11} &r_{12} &r_{13} \\ r_{21} &r_{22} &r_{23} \\ r_{31} &r_{32} & r_{33} \end{bmatrix}

The two are combined to obtain the augmented matrix [R|t], such that:

[R|t]\begin{bmatrix} x_{w}^{M}\\ y_{w}^{M}\\ z_{w} ^{M}\end{bmatrix}=\begin{bmatrix} x_{c}^{M}\\ y_{c}^{M}\\ z_{c}^{M} \end{bmatrix} 

in:

[R|t]=\begin{bmatrix} r_{11} &r_{12} &r_{13} & t_{x}\\ r_{21} & r_{22} &r_{23} & t_{y} \\ r_{31} &r_{32} &r_{33} & t_{z} \end{bmatrix}

        The meaning of this mathematical expression is: if a point defined in the world coordinate system is to be represented by the camera coordinate system, it can be achieved by left-multiplying the world coordinate system coordinates of the point by the matrix [R|t].

In this way, the transformation of the coordinate value of the large M point in the world coordinate system to the coordinate value in the camera coordinate system is completed:

\begin{bmatrix} x_{c}^{M}\\ y_{c}^{M}\\ z_{c}^{M} \end{bmatrix} = \begin{bmatrix} r_{11} &r_{12} &r_{13} & t_{x}\\ r_{21} & r_{22} &r_{23} & t_{y} \\ r_{31} &r_{32} &r_{33} & t_{z} \end{bmatrix} \begin{bmatrix} x_{w}^{M}\\ y_{w}^{M}\\ z_{w}^{M}\\ 1 \end{bmatrix}


Intrinsic parameters internal parameters:

        Through the previous research, we found the connection between the world coordinate system and the camera coordinate system, which is equivalent to learning to use the camera coordinate system to represent objects in the world (3D Objects). Now, we use the camera coordinate system to describe the actual objects in the world respectively . The image of the object and the object in the physical imaging plane "moved to the front" , that is, in the camera coordinate system, different coordinate values ​​are used to define the image of the large M point of the actual object in the world and the virtual imaging plane --- small m points (Figure 1) and find the connection between them.

 (Picture 1)

        ORepresents the optical center , also called the photographic center. The straight line that passes through the optical center and is perpendicular to the physical imaging plane is called the principal axis , and the vertical point Ois called the principal point . The distance between the optical center Oand the principal point is the focal length f.O_{c}

        Now, in the camera coordinate system, we let the coordinate value of a certain point in the world larger than M be M=[ x_{c}^{M},y_{c}^{M},z_{c}^{M}]. The image formed in the virtual imaging plane is small m, and the coordinate value of small m is m=[ x_{c}^{m},y_{c}^{m},z_{c}^{m}] (Note: for the superscript of xyz, I use capital M to represent the coordinate value corresponding to the actual point large M, use A lowercase m represents a point in the virtual imaging plane (small m). At the same time, we make the main optical axis coincide with the axis in the camera coordinate system . Looking at the plane composed of the axis z_{c}in the camera coordinate system (Figure 2), we let the projection of the large M on this plane be , and let the small m be in -The projection on the plane is .y_{c}z_{c}M_{y}y_{c}z_{c}m_{y}

 (Figure II)

        In a triangle O_{c}Om_{y}, O_{c}Othe length of the line segment is the coordinate value of the small m in z_{c}the axial direction z_{c}^{m}, and the length of the line segment is the coordinate value m_{y}Oof the small m in the axial direction . In a triangle , the length of a line segment is , and the length of a line segment is . According to the similarity between triangles and triangles , the following relationship can be established:y_{c}y_{c}^{m}O_{c}QM_{y}O_{c}Qz_{c}^{M}M_{y}Qy_{c}^{M}O_{c}Om_{y}O_{c}QM_{y}

z_{c}^{M}/z_{c}^{m}=y_{c}^{M}/{y_{c}^{m}}

        And because the small m point must be on the physical imaging plane, in the 3D camera coordinate system, it is z_{c}^{m}always equal to the focal length f. After substituting into the above formula, we get:

z_{c}^{M}/f=y_{c}^{M}/{y_{c}^{m}}

{y_{c}^{m}}={f}*y_{c}^{M}/z_{c}^{M}

Similarly, if we only look at the plane formed by the x_{c}and axis         in the camera coordinate system (see Figure 3), and denotes the projection of large M on this plane, and denotes the projection of small m on the - plane:z_{c}M_{x}m_{x}x_{c}z_{c}

  (Picture 3)

According to similar similar triangles  O_{c}Om_{x}and triangles O_{c}QM_{x}, the following relationship can be established

z_{c}^{M}/z_{c}^{m}=x_{c}^{M}/{x_{c}^{m}}

z_{c}^{M}/{f}=x_{c}^{M}/{x_{c}^{m}}

{x_{c}^{m}}={f}*x_{c}^{M}/z_{c}^{M}

        In this way, we have established the relationship between the large M in the world and the corresponding point small m on the virtual imaging plane in the camera coordinate system:

{x_{c}^{m}}={f}*x_{c}^{M}/z_{c}^{M}

{y_{c}^{m}}={f}*y_{c}^{M}/z_{c}^{M}

(The above two formulas are collectively called formula 1 )


Camera coordinate system to image plane coordinate system:

        And because, the small m point in the virtual imaging plane is not only in the 3D camera coordinate system, but also in the 2D image plane coordinate system. Moreover, the center of the image plane coordinate system is on the principal optical axis. This means that for the same point optical center O, its coordinate value in the camera coordinate system is the same as its coordinate value in the 2D image plane coordinate system . That is, the coordinate value of the optical center in the camera coordinate system is [ {x_{c}^{O}}=0, {y_{c}^{O}}=0], and at the same time, its coordinate value in the 2D image plane coordinate system is also equal to [ {x_{i}^{O}}=0, {y_{i}^{O}}=0]

        In the same way, it is known that the coordinate value of the small m point in the camera coordinate system is m=[, {x_{c}^{m}}] {y_{c}^{m}}, and let the coordinate value of the small m point in the phase plane be m=[ {x_{i}^{m}}, {y_{i}^{m}}], then there is:

{x_{i}^{m}}={x_{c}^{m}}

{y_{i}^{m}}={y_{c}^{m}}

( Official 2 )

As shown in Figure 4:

 (Picture 4)

This completes the conversion from the camera coordinate system to the image plane coordinate system.


Image plane coordinate system to image coordinate system:

        Inside the camera, the physical imaging plane is sampled by the sensor in pixel units, and the origin of the image coordinate system O_{p}is at the upper left corner of the image, see Figure 5. Therefore, the coordinate value of the small m point in the plane coordinate system also needs a conversion relationship.

 (Picture 5)

        On the one hand, the image coordinate system is a sampling of the image plane coordinate system using mxn pixels. Therefore, a conversion from the image plane coordinate system in mm to the image coordinate system in pixel is required.

        Assume that the physical size of the image sensor, that is, the size of the physical imaging plane is mxn (mm), and the image size saved by the sensor is wxh (pixel). If you want to save the mxn image to the wxh image , the proportional relationship between the physical imaging plane in mm and the image in pixels is:

dx=m/w(mm/pixel)

dy=n/h(mm/pixel)

The first equation represents how wide each pixel in the image is physically in mm.

The second equation represents the physical size of each pixel in the image in mm.

x_{i}In this way, the coordinate value of the small m point in the image plane coordinate system (that is, the length in the direction x_{i}^{m}(mm) and the length in y_{i}the direction         ) can be replaced with the coordinate value of the image coordinate system (which row and column) Length y_{i}^{m}(mm)):

u_{p}^{m}(pixel)=x_{i}^{m}(mm)/dx(mm/pixel)

v_{p}^{m}(pixel)=y_{i}^{m}(mm)/dy(mm/pixel)

( Official 3 )


        On the other hand: the origin of the two-dimensional image coordinate system is at the upper left corner of the image (sensor), while the origin of the image plane coordinate system is at the center of the sensor. Therefore, for the same point optical center O, its coordinate value in the 2D image coordinate system is different from its coordinate value in the 2D image plane coordinate system, and there is an offset Offset between the two coordinate values . We define the offset in the direction in the image coordinate system as , u_{p}which is equal to half the width of the image---w/2, and the offset in the direction as , which is equal to half the length of the image--- h/2.O_{p}O_{i}u_{p}^{offset}v_{p}O_{p}O_{i}v_{p}^{offset}

The coordinate value of the optical center O in the image coordinate system is:

{u_{p}^{O}}(pixel)={x_{i}^{O}}(mm)+{u_{p}^{offset}}(pixel)

{v_{p}^{O}}={y_{i}^{O}}+{v_{p}^{offset}}

( Official 4 )

in:

{u_{p}^{offset}}=w/2(pixel)

{v_{p}^{offset}}=h/2(pixel)

        The meaning of formula 4 is: the coordinate value of the optical center O in the image plane is equal to its coordinate value in the image plane plus a certain offset. In the same way, the coordinate value of the small m point that has been converted to the image coordinate system (see formula 3 ), after adding Offset, is:

u_{p}^{m}(pixel)=x_{i}^{m}(mm)/dx(mm/pixel)+{u_{p}^{offset}}(pixel)

v_{p}^{m}(pixel)=y_{i}^{m}(mm)/dy(mm/pixel)+{v_{p}^{offset}}(pixel)

( Official 5

Furthermore, after incorporating Formula 2 into Formula 5 , we have:

u_{p}^{m}(pixel)=x_{c}^{m}(mm)/dx(mm/pixel)+{u_{p}^{offset}}(pixel)

v_{p}^{m}(pixel)=y_{c}^{m}(mm)/dy(mm/pixel)+{v_{p}^{offset}}(pixel)

Then put it into formula 1 to get:

u_{p}^{m}(pixel)={f}*x_{c}^{M}/z_{c}^{M}(mm)/dx(mm/pixel)+{u_{p}^{offset}}(pixel)

v_{p}^{m}(pixel)={f}*y_{c}^{M}/z_{c}^{M}(mm)/dy(mm/pixel)+{v_{p}^{offset}}(pixel)

( Official 6 )

We let f_{x}=f/d_{x},f_{y}=f/d_{y}, the above formula can be simplified to:

u_{p}^{m}(pixel)=f_{x}*x_{c}^{M}/z_{c}^{M}(pixel)+{u_{p}^{offset}}(pixel)

v_{p}^{m}(pixel)=f_{y}*y_{c}^{M}/z_{c}^{M}(pixel)+{v_{p}^{offset}}(pixel)

( Official 7 )

in:

1, f_{x}indicating how many pixels the physical focal length f in mm is equal to in the horizontal direction.

2. f_{y}Indicates how many pixels the physical focal length f in mm is equal to in the vertical direction.

Formula 7 can be expressed in matrix form as:

\begin{bmatrix}u_{p}^{m} \\v_{p}^{m} \\1 \end{bmatrix}=\begin{bmatrix} f_{x} & 0&u_{p}^{offset} \\ 0& f_{y} &v_{p}^{offset} \\ 0& 0& 1\end{bmatrix}\begin{bmatrix}x_{c}^{M}/z_{c}^{M} \\y_{c}^{M}/z_{c}^{M} \\1 \end{bmatrix}

Generally speaking, u_{p}^{offset}the offset between the x direction and the y direction is exactly half the length and width of the image, that is, [ c_{x}=w/2, c_{y}=h/2]. In this way, the internal parameter matrix can be changed to:

\begin{bmatrix}u_{p}^{m} \\v_{p}^{m} \\1 \end{bmatrix}=\begin{bmatrix} f_{x} & 0&c_{x} \\ 0& f_{y} &c_{y} \\ 0& 0& 1\end{bmatrix}\begin{bmatrix}x_{c}^{M}/z_{c}^{M} \\y_{c}^{M}/z_{c}^{M} \\1 \end{bmatrix}

The 3x3 matrix among them is called the internal parameter matrix, represented by the uppercase English letter K.


 Summarize:

Finally, let’s sort out the entire conversion process:

1. The coordinate value of the big M point in the world coordinate system [ ] is obtained through the external parameter matrixx_{w}^{M},y_{w}^{M},z_{w}^{M} [R t] . The coordinate value of the big M point in the camera coordinate system [ ] is obtained. (Find the relationship between two coordinate systems through the coordinate values ​​of the same point in different coordinate systems.)x_{c}^{M},y_{c}^{M},z_{c}^{M}

2. In the camera coordinate system, calculate the coordinate value of the small m point corresponding to the large M point in the virtual imaging plane based on similar triangles [ ] x_{c}^{m},y_{c}^{m},z_{c}^{m}. (Find the relationship between the coordinate values ​​of these two points through different points in the same coordinate system)

{x_{c}^{m}}3. According to the position of the virtual imaging plane in the camera coordinate system, according to the coordinate value [ , {y_{c}^{m}}, ] of the small m point in the camera coordinate system, obtain its coordinate value [ = , = ] {z_{c}^{m}}=fin the image plane coordinate system . (Find the relationship between two coordinate systems through the coordinate values ​​of the same point in different coordinate systems.){x_{i}^{m}}{x_{c}^{m}}{y_{i}^{m}}{y_{c}^{m}}

{x_{i}^{m}}4. Finally, according to the relative relationship between the image plane coordinate system and the image coordinate system, the coordinate value [ , ] of the small m point in the image plane is transferred to the corresponding coordinate value [ , ] in the {y_{i}^{m}}image coordinate system u_{p}^{m}through v_{p}^{m}the internal parameter matrix.

references:

        1,https://www.cnblogs.com/xiaohuidi/p/15711767.html

        2,What Is Camera Calibration?- MATLAB & Simulink- MathWorks 中国

        3. 2.3 Camera model of perspective projection_bilibili_bilibili

 

(The accompanying picture has nothing to do with this article) 

 

 

Copyright statement: Some of the pictures, texts or other materials in this article may come from many different websites and descriptions. It is impossible to list them all here. If there is any infringement, please inform us and they will be deleted immediately. Everyone is welcome to reprint, but if someone quotes or copies my article, you must indicate in your article that the pictures or text you use come from my article, otherwise, infringement will be investigated. ----Panasonic J27 

Guess you like

Origin blog.csdn.net/daduzimama/article/details/132098564