Convert OpenCV camera to OpenGL camera

0 Preface

Reprinted from .

1. Preliminary knowledge and overview

This article assumes that you are already familiar with the concept of a pinhole camera, which is the camera calibration model of a typical camera in the OpenCV library. For more details on camera models and projected geometry, the best explanation comes from Richard Hartley and Andrew Zisserman's book Multiple View Geometry in Computer Vision, specifically Chapter 6 "Camera Models" (this is my extremely biased view ). I will abbreviate this book as "HZ book" for the rest of this tutorial.

Many camera parameters can be set using OpenGL function calls such as glFrustrum() and glOrtho(). However, in this tutorial I set the matrices directly and then send these matrices to the shader. If these details are new to you, please refer to the code examples in this tutorial, which will make it clearer. Since the matrix is used directly, this tutorial may provide some clues to derive the OpenGL projection matrix for similar camera models similar to the pinhole model.

2. Resources

Here are two resources I used to solve this problem. Both are great resources. If you are here, most likely you will find that these two resources do not detail how to convert an OpenCV calibration matrix to an OpenGL matrix, which is also the goal of this tutorial.

If you're new to modern OpenGL, the following set of tutorials is a good starting point:

LearnOpenGL, a great resource for learning modern OpenGL

3.Image coordinate system in OpenCV and OpenGL

3.1. Spindles in OpenCV/HZ and OpenGL

First, we will discuss the two standard image coordinate systems in all their details.

In both HZ and OpenCV coordinate systems, the camera assumes that the principal axis is aligned with the positive z-axis. In other words, the positive z-axis points into the camera's field of view. In OpenGL, on the other hand, the main axis is aligned with the negative z-axis in the image coordinate system. Due to these changes, the x-axis between the two representations will also be rotated 180 degrees.

3.2. Homogeneous coordinates and normalized device coordinates in OpenGL

In the OpenCV/HZ framework, there are three coordinate systems: image coordinate system, camera coordinate system and world coordinate system.
Within the OpenGL framework, there are four coordinate systems: image coordinate system, camera coordinate system, world coordinate system and... Normalized Device Coordinates or NDC. I'll explain how these coordinate systems work, along with the order of operations and other required basics, by analogy to the OpenCV/HZ framework.

4. Projection in OpenCV/HZ framework

Insert image description here
Figure 1. Schematic diagram of the relationship between three coordinate systems: world coordinate system, camera coordinate system and image coordinate system in the OpenCV framework. The scene observed by the camera is located in the positive Z-axis direction of the camera coordinate system. In the image coordinate system, note that the origin is in the lower left corner and the Y axis is upward - the opposite direction of Y in the data matrix layout. See Figure 3 for more information. ( $x_0, y_0$ ) is the principal point in the image coordinate system and is a parameter found during camera calibration. There is also a pixel coordinate system, which I won’t explain much.

set up:

$KCV\mathbf{K}_{CV}$ : The size of the upper triangle is $\times 3$ ’s internal camera calibration matrix.
$R\mathbf{R}$ (rotation): Orthogonal has size $\times 3$ matrix.
$\mathbf{t}$ (translation): column vector, size 3.
$\mathbf{X}$ (world point): column vector, size 4.
$\mathbf{x}_{CV}$ (image points): column vector with size 3.

So,

$\mathbf{x}_{CV}=\mathbf{K}_{CV}[\mathbf{R}|\mathbf{t}]\mathbf{X}$ Since we are dealing with homogeneous coordinates, we need to calculate $\mathbf{x}_{CV}$ Perform normalization. Assume that the indexing of the vector is 0-based (in other words, the first item has index 0, so the third item has index (2):

$\mathbf{x}_{CV}=\frac{\mathbf{x}_{CV}}{\mathbf{x}_{CV}(2)}$
After this operation,

$\begin{aligned}\mathbf{x}_{CV}(0)&=x_{col}\\\mathbf{x}_ {CV}(1)&=x_{row}\\\mathbf{x}_{CV}(2)&=1\end{aligned}$

if $x_{col}$ 或 $x_{row}$ Not in image space, these points will not be drawn on the image. For example, image coordinates less than zero are discarded, as are coordinates greater than the image size. A similar process happens in OpenGL, just add a dimension (z)!

5.Projection in the OpenGL framework

Insert image description here
figure 2. It's important to note that this diagram is for the highly specific case of converting from the OpenCV convention to the OpenGL convention. Typically, the principal axis of the camera coordinate system is negative z. If this is the case with your calibration matrix, stop and consult other guidance.

Assume that our hypothetical calibration matrix principal axis is +z in the camera coordinate system, so, same as OpenCV, the start of the pipeline is

By rotation and translation $\mathbf{R}\mid\mathbf{t}$ transforms the world point.
Then, the camera coordinate system is slightly different; in OpenGL there is the concept of near plane and far plane, and these parameters are defined by the user. By $\mathbf{K}_{GL}$ , transforms the points in the camera coordinate system into the next space - what I call cuboid space, which is not a proper rotation and translation, but a reflection into the left-handed coordinate system.
Then, $\mathbf{NDC}$ transformation transforms the cuboid space into one with angles $\pm 1$ Cube—Normalized device coordinate system.

This completes all user-specified transformations - once the coordinates are in the left-handed normalized device coordinate system, OpenGL will convert those coordinates into image coordinates. To troubleshoot and perform the conversion yourself, the equation can be found in Conversion Corner 1.

set up:

$\mathbf{NDC}$ (normalized device coordinates): $\times 4$ internal reference camera calibration matrix.
$\mathbf{K}_{GL}$ ： $\times 4$ internal reference camera calibration matrix.
$R\mathbf{R}$ (rotation): orthogonal $\times 3$ matrix.
$\mathbf{t}$ (translation): column vector, size 3.
$\mathbf{X}$ (world coordinate point): column vector, size 4.
$\mathbf{x}_{GL}$ (image points): column vector, size 4.

then,

$\mathbf{x}_{GL}=\mathbf{NDC}\\\mathbf{K}_{GL}\begin{bmatrix}\&\mathbf{; R}&\&\mathbf{t}\\0&0&0&1\end{bmatrix}\mathbf{X}$

yet \mathbf{NDC} $The NDC$ matrix, instead, specifies how all coordinate systems in OpenGL work. But don’t worry, we’ll explain each of these items one by one.

First, in OpenGL, there is a 剪裁点/对象concept of planes, which are located between the near and far planes. However, 在 OpenCV 框架中，我们认为位于主平面和正无穷远之间的任何点都是可视点this is not the case in OpenGL. To account for these planes, clipping in image space (in the previous section) is very intuitive in OpenCV - if the point is not in the image (defined as ∈ [ 0 , cols ) × [ 0 , $\in[0, cols)\times[0, rows)$ achieve a similar goal.

I represent OpenGL's NDC coordinates as $\mathbf{x}_{GL}$ , which is a column vector with 4 elements. It is also a homogeneous vector whose last element is usually represented by the letter w. by dividing the 4th entry (again assuming indexing starts from 0) \mathbf{x}_{GL} $x_{G L}$ Do the normalization; I will say that the 4-element vector is normalized when the last element is equal to 1:

$\mathbf{x}_{GL}=\frac{\mathbf{x}_{GL}}{\mathbf{x}_{GL}(3)}$

Similar to before, we get similar results:

$\mathbf{x}_{GL}(0)=x_{NDC}$ $\mathbf{x}_{GL}(1)=y_{NDC}$ $\mathbf{x}_{GL}(2)=z_{NDC}$ $\mathbf{x}_{GL}(3)=1$
coordinates are not necessarily image coordinates. OpenGL requires z values to calculate the drawing order of objects. The NDC space is a cube with each side length 2 and its dimensions are $1]\times[-1, 1]\ times[-1, 1]$ . Song Ho's website has some great illustrations about the NDC space.

if $(\mathbf{x}_{GL})$ satisfies $\mid a\mid > 1$ , then $\mathbf{x}_{GL}$ will not be drawn (or rather, the edge with that coordinate will be clipped). In other words, if any coordinate is less than -1 or greater than 1, then it is outside NDC space.

You may have noticed that the output of OpenGL operations are not really image coordinates as we are used to in OpenCV - in other words, not coordinates in the data matrix. You are right. OpenGL takes care of converting this to image space, but it's useful to understand how these conversions work. Therefore, for troubleshooting purposes, please refer to the conversion formula box below.

To convert OpenGL NDC coordinates to OpenGL image coordinates, where $\mathbf{x}_{image, GL}$ is a 3-element vector, and $\mathbf{x}_{GL}$ Already normalized:

$x_{image, GL} = \begin{bmatrix} \frac{cols}{2} & 0 & 0 \\ 0 & \frac{rows}{2} & 0 \\ 0 & 0 & 1 \end{bmatrix} x_{GL}$

Note that since the image coordinate system in OpenGL is defined differently than in OpenCV (see Figure 3), a further transformation is required to convert these coordinates into OpenCV coordinates:

$x_{CV} = \begin{bmatrix} 1 & 0 & 0 \\ 0 & -1 & 0 \\ 0 & rows & 1 \end{bmatrix} x_{image, GL}$

OpenGL projection matrix

$[\mathbf{R}\mid\mathbf{t}]$ in the OpenCV matrix $[R ∣ t]$ , but adding a line to it makes it a square matrix. For example:

$[\mathbf{R}\mid\mathbf{t}]_{GL} = \begin{bmatrix}\&\mathbf{R}&\&\ mathbf{t}\\0&0&0&1\end{bmatrix}$

$\mathbf{K}_{CV}$ in the OpenCV context $K_{C V}$ , its form is as follows:

$\mathbf{K}_{CV} = \begin{bmatrix}{\alpha}&0&x_0\\0&\beta&y_0\\0& 0 & 1\end{bmatrix}$

Next, we will use this modified OpenCV matrix $\mathbf{K}_{CV}$ To create the corresponding OpenGL matrix $\mathbf{K}_{GL}$ Perspective projection matrix. NOTE: The offset parameters in the first row and second column of the OpenCV intrinsic matrix are likely also available in $\mathbf{K}_{GL}$ in a manner similar to that described in Kyle Simek's guide $K_{G L}$ Take negative values for this parameter in the representation to model it in OpenGL. However, I haven't tested this, and I tend to set the offset parameter to zero when calibrating, so I leave it to you to test!

After completing these preparations and using the rows and cols of the image as dimensions, we define two new variables and a new internal parameter matrix in the OpenGL context.

$A = - (n e a r + f a r)$
$B = n e a r * for KGL$ $\mathbf{K}_{GL} = \begin{bmatrix}$ $;$
${-\alpha}&0&-(cols-x_0)&0\\0&\beta&-(rows-y_0)&0\\0&0&A&B\\0&0&1&0\end {bmatrix}$

$\mathbf{NDC} = \begin{bmatrix} {-; \frac{2}{cols}}&0&0&1\\0&\frac{2}{rows}&0&1\\0&0&\frac{-2}{far-near}&\ frac{-(far+near)}{(far-near)} \\ 0 & 0& 0 & 1\end{bmatrix}$

Now, take a closer look at Figure 2 and these matrices, and you might be thinking, "Gosh, why are you switching back and forth between positive and negative, right-handed to left-handed coordinate systems, etc. Isn't that a drag?" , my answer is: "Yes." But there are a few things to note: I'm covering the OpenGL pipeline from the perspective of a computer vision enthusiast, fond of HZ books and OpenCV, and with a certain degree of hand gestures. Actually, what's even more confusing is that OpenGL's camera coordinate system has the negative z-axis as its main axis. I'll say it again, in case you haven't noticed yet - if you have a matrix that is calibrated assuming the negative z-axis as the main axis, check out other resources. I did a lot of testing to confirm this works.

Before diving in, how do you test your camera model? It's easiest to use a scripting language like Matlab or octave (free), but you can also do it using C++ and Eigen, Python, or any other language you're familiar with.

Gets a coordinate from world space, this can be from an object (3D model) file. This is $\mathbf{X}$ , remember that we are using homogeneous coordinates, so it has 4 elements.
projected onto the image plane in OpenCV using the matrix you have \mathbf{x}_{CV} $x_{C V}$ 。
Replace all values in the OpenCV matrix with the above OpenGL matrix. Note that Matlab and/or octave are 1-based indexing languages, not 0-adjust accordingly.
if $\mathbf{x}_{CV}$ On the image plane, then $\mathbf{x}_{GL}$ Should also be on the image plane. If after normalization, $\mathbf{x}_{GL}$ Any coordinate of is less than -1 or greater than +1 then it will be clipped. If something goes wrong, troubleshoot here!
Use a transformation to check if OpenCV coordinates are equivalent to OpenGL coordinates.

Insert image description here

Figure 3. The upper subfigure shows the image coordinate system of the OpenCV and OpenGL contexts. The lower left corner shows the definition of the row (r) and column (c) indices of the OpenGL coordinate system; the OpenGL coordinate system is the same as the image coordinate system (r=y). The origin of the matrix in the OpenCV context is at the upper left corner, so the OpenGL image grabbed from glReadPixels() needs to be flipped vertically.

Finally, I will end with an academic point about the layout of the data matrix (pixel coordinate system) , in other words, the indices of the rows and columns containing the data pixels, and the layout independent of the image coordinate system. OpenGL's layout is different from OpenCV's, which I mentioned in "Conversion Angle" and is described and illustrated in detail in the description of Figure 3.

My code (here) currently uses OpenGL to render the scene, with the correct orientation. Then, use glReadPixels() to grab the buffer and write it to an OpenCV Mat image structure - upside down so that it becomes forward. For detailed code information, please refer to the original blog post .