5 coordinate systems that OpenGL development must understand

OpenGL expects that after each vertex shader run, all vertices we wish to become visible are in normalized device coordinates. That is, the x, y, z coordinates of each vertex should be between -1.0 and 1.0; coordinates outside this range will not be visible. What we usually do is specify coordinates in a range (or space) that we determine ourselves, and convert those coordinates to Normalized Device Coordinates (NDC) in the vertex shader. These NDCs are then fed to the rasterizer, which converts them to 2D coordinates/pixels on the screen.

insert image description here

Recommendation: Use NSDT Designer to quickly build programmable 3D scenes.

Converting coordinates to NDC is usually done step-by-step, we transform the object's vertices into several coordinate systems, and then finally convert them to NDC. The advantage of converting them to multiple intermediate coordinate systems is that certain operations/calculations are easier in certain coordinate systems, as will become apparent shortly. There are a total of 5 different coordinate systems that are important to us:

  • local space (or object space)
  • world space
  • view space (or eye space)
  • clip space
  • screen space

These are different states where our vertices are transformed before they end up as fragments.

You're probably pretty confused by now about what spaces or coordinate systems actually are, so we'll start by explaining them in a more high-level way by showing the overall picture and what each specific space represents.

1. Panorama

To transform coordinates from one space to the next, we will use several transformation matrices, the most important of which are the model, view and projection matrices. Our vertex coordinates start out in local space as local coordinates, which are then further processed into world coordinates, view coordinates, clip coordinates, and finally as screen coordinates. The following diagram shows the process and shows what each transformation does:
insert image description here

  • Local coordinates are the coordinates of an object relative to its local origin; they are the coordinates where your object starts.
  • The next step is to convert the local coordinates to world space coordinates, which are coordinates relative to the larger world. These coordinates are relative to some global origin of the world, and many other objects are placed relative to that world origin.
  • Next, we transform the world coordinates into view space coordinates such that each coordinate is seen from the camera or viewer's point of view.
  • Once the coordinates are in view space, we want to project them to clip coordinates. Clip coordinates are handled as -1.0 and 1.0 ranges and determine which vertices will end up on screen. Projecting to clip space coordinates can add perspective if using a perspective projection.
  • Finally, we convert clip coordinates to screen coordinates, a process called viewport conversion, which converts coordinates from -1.0 and 1.0 to the coordinate range defined by glViewport. The resulting coordinates are then sent to a rasterizer to convert them into fragments.

You probably have a little idea of ​​what each individual space is for. The reason we transform vertices into all these different spaces is that certain operations make more sense or are easier to use in certain coordinate systems. For example, when modifying an object, it makes most sense to do so in local space, while computing certain operations on objects relative to other objects makes most sense in world coordinates, etc. We could define a transformation matrix from local space to clip space if we wanted, but that would reduce our flexibility.

We discuss each coordinate system in more detail below.

2. Local space

Local space is the coordinate space local to the object, i.e. where the object starts. Imagine you have created a cube in a modeling package such as Blender. The origin of your cube may be at (0,0,0), even though your cube may end up at a different location in your final application. Probably all the models you create will have (0,0,0) as their initial position. Therefore, all vertices of the model are located in local space: they are all local to the object.

The vertices of the containers we use are specified as coordinates between -0.5 and 0.5, with 0.0 as the origin. These are local coordinates.

3. World space

If we imported all objects directly into the application, they might all be somewhere inside each other at the world origin (0,0,0), which is not what we want. We want to define a position for each object to place them in the larger world.

Coordinates in world space are exactly what they sound like: the coordinates of all vertices relative to the (game) world. This is the coordinate space you want to transform your objects into so that they all scatter in that position (preferably in a realistic way). The object's coordinates are transformed from local space to world space; this is done through the model matrix.

The model matrix is ​​a transformation matrix that translates, scales and/or rotates your objects to place them where they belong in the world/orientation. Think of it as remodeling a house by shrinking it (a bit too big in local space), turning it into a suburban town and rotating it a bit to the left on the y-axis so that it fits perfectly with neighboring town houses. You can think of the matrix from the previous chapter that placed containers throughout the scene as a kind of model matrix; we transform the container's local coordinates to some different position in the scene/world.

4. View space

View space is what people usually call OpenGL's camera (sometimes called camera space or eye space). View space is the result of transforming world space coordinates to coordinates in front of the user's view. Thus, view space is the space seen from the camera's point of view. This is usually done by translating/rotating the scene with a combination of translation and rotation so that certain items transform to be in front of the camera. These combined transformations are usually stored in the view matrix which transforms world coordinates into view space. In the next chapter, we will discuss extensively how to create such a view matrix to simulate a camera.

5. Clipping space

At the end of each vertex shader run, OpenGL expects coordinates to be within a certain range, and any coordinates outside that range are clipped. The clipped coordinates are discarded, so the remaining coordinates end up as fragments visible on the screen. This is where the clip space name comes from.

Because it's not intuitive to specify all visible coordinates to be in the range -1.0 and 1.0, we specify our own set of coordinates to use and convert them back to NDC, as OpenGL expects.

To transform vertex coordinates from view to clip space, we define a so-called projection matrix, which specifies a series of coordinates, for example -1000 and 1000 for each dimension. The projection matrix then converts coordinates in the specified range to normalized device coordinates (-1.0, 1.0) (not a direct conversion, there is a step in between called "perspective division"). All coordinates outside this range do not map between -1.0 and 1.0 and are therefore clipped. With this range that we specify in the projection matrix, the coordinates (1250, 500, 750) will not be visible because the x coordinate is out of range and thus converted to a coordinate above 1.0 in NDC and thus clipped.

Note that if only part of the primitive, such as a triangle, is outside the clipping volume, OpenGL will reconstruct the triangle into one or more triangles to fit within the clipping volume.

This viewing box created by the projection matrix is ​​called the viewing frustum, and every coordinate that ends up inside that viewing frustum will end up on the user's screen. The whole process of transforming coordinates in a specified range into NDCs that can be easily mapped to 2D view space coordinates is called projection because a projection matrix projects 3D coordinates to 2D normalized device coordinates that can be easily mapped to 2D.

Once all vertices are transformed to clip space, a final operation called perspective division is performed, where we divide the x, y, and z components of the position vector by the homogeneous w component of the vector; perspective division converts 4D clip space coordinates to 3D normalized device coordinates. This step is performed automatically at the end of the vertex shader step.

After this stage, the resulting coordinates are mapped to screen coordinates (using glViewport's settings) and converted to fragments.

The projection matrix that transforms view coordinates to clip coordinates usually takes two different forms, each of which defines its own unique viewing frustum. We can create an orthographic projection matrix or a perspective projection matrix.

6. Orthogonal projection

The orthographic projection matrix defines a cube-like volume that defines the clipping space, clipping every vertex outside that box. When creating an orthographic projection matrix, we specify the width, height and length of the visible volume. All coordinates within this volume will end up within the NDC range after its matrix transformation, and thus will not be clipped. A volume looks a bit like a container:
insert image description here

Volumes define visible coordinates and are specified by width, height, and near and far planes. Any coordinates in front of the near plane are clipped, and the same applies to coordinates behind the far plane. Orthographic projection directly maps all coordinates inside the clipping volume to normalized device coordinates, without any special side effects, since it does not touch the w component of the transformed vector; perspective division does not change the coordinates if the w component remains equal to 1.0.

To create an orthographic projection matrix, we use GLM's built-in function glm::ortho:

glm::ortho(0.0f, 800.0f, 0.0f, 600.0f, 0.1f, 100.0f);

The first two parameters specify the left and right coordinates of the clipping volume, and the third and fourth parameters specify the bottom and top. Through these 4 points, we define the size of the near plane and the far plane, and the 5th and 6th parameters define the distance between the near plane and the far plane. This specific projection matrix converts all coordinates between these x, y, and z range values ​​to normalized device coordinates.

Orthographic projection matrices directly map coordinates to the 2D plane of the screen, but in practice, direct projection produces unrealistic results because projection does not take perspective into account. This is the problem that the perspective projection matrix solves for us.

7. Perspective projection

If you've ever enjoyed real-life graphics, objects that are farther away appear much smaller. This strange effect is what we call the perspective effect. Perspective is especially noticeable when looking down at the end of an infinite highway or railroad, as shown in the image below:

insert image description here

As you can see, the lines seem to coincide at a far enough distance due to perspective. This is exactly what perspective projection tries to mimic, and it does so using a perspective projection matrix.

The projection matrix maps a given frustum extent to clip space, but also manipulates the w value of each vertex coordinate in such a way that the w component becomes higher the further the vertex coordinate is from the viewer. Once the coordinates are converted to clip space, they will be in the range -w to w (anything outside that will be clipped). OpenGL requires visible coordinates to fall between the -1.0 and 1.0 range for final vertex shader output, so perspective division is applied to clip space coordinates once they are in clip space:
insert image description here

Each component of a vertex coordinate is divided by its w component, the farther the vertex is from the viewer, the smaller the vertex coordinate. This is another reason why the w component is important, as it helps us with perspective projection. The resulting coordinates are in normalized device space. If you're interested in seeing how the orthographic and perspective projection matrices are actually calculated (and aren't too scared of the math), I can recommend this great article by Songho.

A perspective projection matrix can be created in GLM as follows:

glm::mat4 proj = glm::perspective(glm::radians(45.0f), (float)width/(float)height, 0.1f, 100.0f);

What glm::perspective does is create again a frustum defining the visible space, anything outside the frustum will not end up in the clip space volume and will therefore be clipped. A perspective frustum can be thought of as a non-uniformly shaped box where each coordinate inside the box maps to a point in clip space. An image of the perspective frustum looks like this:

insert image description here

Its first parameter defines the fov value, which represents the field of view and sets how large the viewing space is. For a realistic view, this is usually set to 45 degrees, but for more doom-style results, you can set it to a higher value. The second parameter sets the aspect ratio, which is calculated by dividing the viewport's width by its height. The third and fourth parameters set the near and far planes of the frustum. We usually set near distance to 0.1 and far distance to 100.0. All vertices between the near and far planes and within the viewing frustum will be rendered.

When the near value of the perspective matrix is ​​set too high (e.g. 10.0), OpenGL clips all coordinates that are close to the camera (between 0.0 and 10.0), which can provide the visual result you've seen before in video games, where certain objects can be penetrated when you get unreasonably close to them.

When using an orthographic projection, each vertex coordinate is mapped directly to clip space, without any fancy perspective division (it still does perspective division, but the w component is not manipulated (it stays at 1), so has no effect).

Since orthographic projections don't use perspective projections, distant objects don't appear to be smaller, which can produce weird visual output. Therefore, orthographic projection is mainly used for 2D rendering and some architectural or engineering applications where we don't want the vertices to be distorted by perspective. Applications such as Blender for 3D modeling sometimes use orthographic projection for modeling because it more accurately depicts the dimensions of each object. Below you will see a comparison of the two projection methods in Blender:
insert image description here

You can see that with a perspective projection, distant vertices appear much smaller, whereas in an orthographic projection, each vertex is the same distance from the user.

8. Integrate together

We create a transformation matrix for each of the above steps: model, view and projection matrices. Then convert the vertex coordinates to clip coordinates like so:
insert image description here

Note that the order of matrix multiplication is reversed (remember we need to read matrix multiplication from right to left). The resulting vertex should then be assigned to gl_Position in the vertex shader, and OpenGL will automatically perform perspective division and clipping.

The output of the vertex shader requires the coordinates to be in clip space, which is what we just did with the transformation matrix. OpenGL then performs perspective division on the clip space coordinates to convert them to normalized device coordinates. OpenGL then uses the parameters in glViewPort to map normalized device coordinates to screen coordinates, where each coordinate corresponds to a point on the screen (in our case the 800x600 screen). This process is called viewport transformation.

This is a difficult subject to grasp, so don't worry if you're still not sure what each space is for. Below you will see how we can really get the most out of these coordinate spaces, and there will be enough examples in the next chapters.

9. Going 3D

Now that we know how to convert 3D coordinates to 2D coordinates, we can start rendering real 3D objects instead of the crappy 2D planes we've shown so far.

To start 3D drawing, we first create a model matrix. Model matrices consist of translations, scales and/or rotations that we wish to apply to transform all object vertices into global world space. Let's transform the plane a bit by rotating it on the x-axis so that it looks like it's lying on the floor. The model matrix looks like this:

glm::mat4 model = glm::mat4(1.0f);
model = glm::rotate(model, glm::radians(-55.0f), glm::vec3(1.0f, 0.0f, 0.0f)); 

By multiplying the vertex coordinates with this model matrix, we transform the vertex coordinates into world coordinates.

Next we need to create a view matrix. We want to move slightly backwards in the scene so the object becomes visible (when we are at the origin (0,0,0) in world space). To move around the scene, consider the following:

Moving the camera backward is the same as moving the entire scene forward.

This is exactly what the view matrix does, we inversely move the entire scene to where we want the camera to move.
Since we want to move backwards, and since OpenGL is a right-handed system, we must move along the positive z-axis. We do this by translating the scene towards the negative z-axis. This gives the impression that we are going backwards.

insert image description here

right hand system

By convention, OpenGL is a right-handed system. What this basically says is that the positive x-axis is to your right, the positive y-axis is above you, and the positive z-axis is behind you. Assuming your screen is the center of the 3 axes, the positive z-axis goes through the screen towards you. The coordinate axes are drawn as shown in the figure above.

To understand why it's called right-handed, do the following:

  • Extend the right arm along the positive y-axis, with the hand up.
  • Make your thumb point to the right.
  • Point your index finger up.
  • Now bend the middle finger down 90 degrees.

If done correctly, the thumb should point in the positive x-axis direction, the index finger should point in the positive y-axis direction, and the middle finger should point in the positive z-axis direction. If you do this with your left arm, you'll see the z-axis reverse. This is called a left-handed system and is usually used by DirectX. Note that in normalized device coordinates, OpenGL actually uses a left-handed system (the projection matrix switches left-handedness).

We'll discuss how to move around the scene in more detail in the next chapter. Now the view matrix looks like this:

glm::mat4 view = glm::mat4(1.0f);
// note that we're translating the scene in the reverse direction of where we want to move
view = glm::translate(view, glm::vec3(0.0f, 0.0f, -3.0f)); 

The last thing we need to define is the projection matrix. We want to use perspective projection in our scene, so we will declare the projection matrix like this:

glm::mat4 projection;
projection = glm::perspective(glm::radians(45.0f), 800.0f / 600.0f, 0.1f, 100.0f);

Now that we have created the transformation matrices, we should pass them to the shader. First, we declare the transformation matrices as uniforms in the vertex shader, and multiply them with the vertex coordinates:

#version 330 core
layout (location = 0) in vec3 aPos;
...
uniform mat4 model;
uniform mat4 view;
uniform mat4 projection;

void main()
{
    // note that we read the multiplication from right to left
    gl_Position = projection * view * model * vec4(aPos, 1.0);
    ...
}

We should also send the matrix to the shader (this is usually done every frame, since transformation matrices tend to change a lot)

int modelLoc = glGetUniformLocation(ourShader.ID, "model");
glUniformMatrix4fv(modelLoc, 1, GL_FALSE, glm::value_ptr(model));
... // same for View Matrix and Projection Matrix

Now that our vertex coordinates are transformed through the model, view and projection matrices, the final object should be:

  • Lean back to the floor.
  • Stay away from us.
  • Displayed in perspective (the farther a vertex is, the smaller it should be).

Let's check that the result actually meets these requirements:
insert image description here

The plane does look like a 3D plane resting on some imaginary floor. If you don't get the same result, compare the code with the full source code.

10. More 3D

So far we've been working with 2D planes, even in 3D space, so let's take the adventurous route and extend the 2D plane to a 3D cube. To render the cube we need a total of 36 vertices (6 faces * 2 triangles * 3 per vertex). 36 vertices is a lot to sum up, so you can retrieve them from here.

For fun, let's make the cube rotate over time:

model = glm::rotate(model, (float)glfwGetTime() * glm::radians(50.0f), glm::vec3(0.5f, 1.0f, 0.0f));  

Then we'll use glDrawArrays to draw the cube (since we didn't specify an index), but this time with 36 vertices

glDrawArrays(GL_TRIANGLES, 0, 36);

You should get something similar to the following, click here for the video to play:

insert image description here

It does look a bit like a cube, but something is wrong. Some faces of the cube are drawn on other faces of the cube. This happens because as OpenGL draws the cube triangle by triangle, fragment by fragment, it overwrites any pixel colors that may have been drawn before. Since OpenGL does not guarantee the order of the rendered triangles (within the same draw call), some triangles will be drawn on top of each other even though one triangle should obviously be in front of the other.

Fortunately, OpenGL stores depth information in a buffer called the z-buffer, which allows OpenGL to decide when to draw on pixels and when not to. Using the z-buffer, we can configure OpenGL for depth testing.

11. Z-buffer

OpenGL stores all its depth information in the z-buffer (also known as the depth buffer). GLFW will automatically create such a buffer for you (just like it has a color buffer that stores the output image colors). The depth is stored in each fragment (as the fragment's z-value), and whenever a fragment wants to output its color, OpenGL compares its depth value to the z-buffer. If the current fragment is behind another fragment, that fragment will be discarded, otherwise it will be overwritten. This process is called depth testing and is done automatically by OpenGL.

However, if we want to make sure that OpenGL does perform depth testing, we first need to tell OpenGL that we want depth testing to be enabled; it is disabled by default. We can enable depth testing with glEnable. The glEnable and glDisable functions allow us to enable/disable certain features in OpenGL. The feature is then enabled/disabled until another call is made to disable/enable it. Now we want to enable depth testing by enabling GL_DEPTH_TEST:

glEnable(GL_DEPTH_TEST);  

Since we use a depth buffer, we also want to clear the depth buffer before each rendering iteration (otherwise the previous frame's depth information remains in the buffer). Just like clearing the color buffer, we can clear the depth buffer by specifying the DEPTH_BUFFER_BIT bit in the glClear function:

glClear(GL_COLOR_BUFFER_BIT | GL_DEPTH_BUFFER_BIT);

Let's rerun our program and see if OpenGL now performs depth testing, video:
insert image description here

let us start! A fully textured cube, properly depth tested, rotated over time. Check out the source code here.

12. More cubes!

Let's say we want to display 10 cubes on the screen. Each cube looks the same, but the only difference is its position in the world, and each cube has a different rotation. The graphics layout of the cube is already defined so we don't have to change buffers or attribute arrays when rendering more objects. For each object, the only thing we need to change is its model matrix, where we convert the cube to the world.

First, we define a translation vector for each cube, specifying its position in world space. We will define 10 cube positions in the glm::vec3 array:

glm::vec3 cubePositions[] = {
    glm::vec3( 0.0f,  0.0f,  0.0f), 
    glm::vec3( 2.0f,  5.0f, -15.0f), 
    glm::vec3(-1.5f, -2.2f, -2.5f),  
    glm::vec3(-3.8f, -2.0f, -12.3f),  
    glm::vec3( 2.4f, -0.4f, -3.5f),  
    glm::vec3(-1.7f,  3.0f, -7.5f),  
    glm::vec3( 1.3f, -2.0f, -2.5f),  
    glm::vec3( 1.5f,  2.0f, -2.5f), 
    glm::vec3( 1.5f,  0.2f, -1.5f), 
    glm::vec3(-1.3f,  1.0f, -1.5f)  
};

Now, in the render loop, we're calling glDrawArrays 10 times, but this time sending a different model matrix to the vertex shader before issuing the draw call. We'll create a small loop inside the render loop that renders the object 10 times each time with a different model matrix. Note that we also added a small unique rotation to each container.

glBindVertexArray(VAO);
for(unsigned int i = 0; i < 10; i++)
{
    glm::mat4 model = glm::mat4(1.0f);
    model = glm::translate(model, cubePositions[i]);
    float angle = 20.0f * i; 
    model = glm::rotate(model, glm::radians(angle), glm::vec3(1.0f, 0.3f, 0.5f));
    ourShader.setMat4("model", model);

    glDrawArrays(GL_TRIANGLES, 0, 36);
}

This code updates the model matrix every time a new cube is drawn, a total of 10 times. Now we should see a world filled with 10 strangely rotating cubes:
insert image description here

Perfect! It looks like our container has found some like-minded friends. If you get stuck, see if you can compare your code with the source code.


Original link: 5 coordinate systems of OpenGL - BimAnt

Guess you like

Origin blog.csdn.net/shebao3333/article/details/131863539