Overview of rasterization algorithm [rasteriztion]

Raster rendering is undoubtedly the most commonly used technique for rendering 3D scene images, however, it is probably the least understood and least documented of all techniques (especially compared to ray tracing).

Why this happens depends on different factors. First of all, this is a technology of the past. We are not saying that this technology is obsolete, quite the contrary, but most of the techniques used to generate images from this algorithm were developed between the 1960s and the early 1980s. In the field of computer graphics, now in the Middle Ages, knowledge of the papers that developed these techniques is often lost. Rasterization is also a technique used by GPUs to generate 3D graphics. Hardware technology has changed a lot since GPUs were first invented, but the basic technology they implement for generating images hasn't changed much since the early 1980s (the hardware has changed, but the underlying pipeline that forms the image no change). In fact, these technologies are so fundamental and therefore so deeply integrated into the hardware architecture that no one pays attention to them anymore (only the people designing the GPUs know what they are doing), which is far from a trivial task. Designing a GPU and understanding how the rasterization algorithm works are two different things; so explaining the latter shouldn't be that hard!).

Regardless, we believe it is urgent and important to correct this situation. With this course, we believe it is the first resource to provide a clear and complete picture of the algorithm as well as a complete practical implementation of the technology.

NSDT tool recommendation: Three.js AI texture development kit - YOLO synthetic data generator - GLTF/GLB online editing - 3D model format online conversion - Programmable 3D scene editor - REVIT export 3D model plug-in - 3D model semantic search engine

1. Introduction to rasterization

Rasterization and ray tracing attempt to solve visibility or hidden surface issues, but in different orders (visibility issues are covered in the Rendering 3D Scene Images, Overview course). What these two algorithms have in common is that they essentially use geometric techniques to solve the problem. In this lesson, we will briefly describe how the rasterization algorithm works. Understanding the principle is very simple, but implementing it requires the use of a range of techniques, especially in the field of geometry, which you will also find explained in this course.

The program we will develop in this course to demonstrate how rasterization works in practice is important because we will use it again in the next course to implement the ray tracing algorithm. Implementing both algorithms in the same program will allow us to more easily compare the output produced by the two rendering techniques (at least before shading is applied they should produce the same results) and performance. This would be a great way to better understand the pros and cons of both algorithms.

2. Rendering algorithm

There is not just one rasterization algorithm, but several, but to get straight to the point we can say that all these different algorithms are based on the same general principle. In other words, all these algorithms are just variations of the same idea. We will refer to this idea or principle when we talk about rasterization in this lesson.

What's the idea? In previous lessons, we have discussed the difference between rasterization and ray tracing. We also suggest that the rendering process can essentially be broken down into two main tasks: visibility and shading. Rasterization is essentially a solution to visibility problems. Visibility involves being able to tell which parts of a 3D object are visible to the camera. Some parts of these objects can be culled because they are either outside the camera's visible area or are hidden by other objects.

Figure 1: In ray tracing, we trace a ray through the center of each pixel in the image and then test whether that ray intersects any geometry in the scene. If an intersection is found, we set the pixel color to the color of the object the ray intersected. Since a ray may intersect multiple objects, we need to keep track of the closest intersection distance.

Solving this problem can basically be done in two ways: ray tracing and rasterization.

2.1 Ray tracing

You can trace the ray that passes through each pixel in the image to find the distance between the camera and any objects (if any) that the ray intersects. The object visible through this pixel is the object with the minimum crossing distance (usually denoted as t). This is the technique used in ray tracing. Note that in this particular case, you create the image by looping through all the pixels in the image, tracing the rays for each pixel, and then finding out whether those rays intersect with any objects in the scene. In other words, the algorithm requires two main loops. The outer loop iterates over the pixels in the image, and the inner loop iterates over the objects in the scene:

for (each pixel in the image) { 
    Ray R = computeRayPassingThroughPixel(x,y); 
    float tclosest = INFINITY; 
    Triangle triangleClosest = NULL; 
    for (each triangle in the scene) { 
        float thit; 
        if (intersect(R, object, thit)) { 
             if (thit < closest) { 
                 triangleClosest = triangle; 
             } 
        } 
    } 
    if (triangleClosest) { 
        imageAtPixel(x,y) = triangleColorAtHitPoint(triangle, tclosest); 
    } 
}

Note that in this example, the object is considered to be composed of triangles (and only triangles). Instead of iterating over other objects, we treat these objects as a pool of triangles and iterate over other triangles. Triangles are often used as basic rendering primitives in ray tracing and rasterization (the GPU needs to triangulate the geometry) for reasons we've already explained in previous lessons.

2.2 Rasterization

Ray tracing is the first possible solution to the visibility problem. We say this technique is image-centric because we are shooting rays from the camera into the scene (we start with an image), rather than the other way around which we would use in rasterization:

Figure 2: Rasterization can be roughly broken down into two steps. We first project the 3D vertices that make up the triangle onto the screen using perspective projection. We then loop through all pixels in the image and test whether they lie within the resulting 2D triangle. If it is, we fill the pixel with the triangle's color.

Rasterization takes the opposite approach. To solve the visibility problem, it actually "projects" the triangle onto the screen, in other words, we use perspective projection to convert that triangle from a 3D representation to a 2D representation. This can be easily done by projecting the vertices that make up the triangle onto the screen (using the perspective projection we just explained). The next step in the algorithm is to use some technique to fill all the pixels of the image covered by this 2D triangle. These two steps are shown in Figure 2. From a technical perspective, they are very simple to execute. The projection step requires only perspective partitioning and remapping of the resulting coordinates from image space to raster space, a process we have already covered in previous lessons. It is also very simple to find out which pixels in the image are covered by the generated triangles, which will be described later.

What does this algorithm look like compared to ray tracing methods? First, note that in rasterization, in the outer loop, we need to iterate over all the triangles in the scene, rather than first iterating over all the pixels in the image. Then, in the inner loop, we iterate over all pixels in the image and find out whether the current pixel is "contained" within the "projected image" of the current triangle (Figure 2). In other words, the inner and outer loops of the two algorithms are swapped:

// rasterization algorithm
for (each triangle in scene) { 
    // STEP 1: project vertices of the triangle using perspective projection
    Vec2f v0 = perspectiveProject(triangle[i].v0); 
    Vec2f v1 = perspectiveProject(triangle[i].v1); 
    Vec2f v2 = perspectiveProject(triangle[i].v2); 
    for (each pixel in image) { 
        // STEP 2: is this pixel contained in the projected image of the triangle?
        if (pixelContainedIn2DTriangle(v0, v1, v2, x, y)) { 
            image(x,y) = triangle[i].color; 
        } 
    } 
}

The algorithm is object-centric in that we actually start from the geometry and walk back to the image, as opposed to the approach used in ray tracing, where we start from the image and walk back to the scene.

The principles of both algorithms are simple, but they differ slightly in complexity when it comes to implementing them and finding solutions to the different problems they need to solve. In ray tracing, generating rays is simple, but finding the intersection of a ray with geometry can be difficult (depending on the type of geometry you're dealing with) and can be computationally expensive. But let's ignore ray tracing for now. In the rasterization algorithm we need to project the vertices onto the screen, which is simple and fast, and we will see that the second step of needing to find out whether the pixel is contained in the 2D representation of the triangle has an equally simple geometric solution.

In other words, computing an image using the rasterization method relies on two very simple and fast techniques (the perspective process and determining whether a pixel lies within a 2D triangle). Rasterization is a good example of an "elegant" algorithm. The technologies it relies on have simple solutions; they are also easy to implement and produce predictable results. For all these reasons, this algorithm is well suited for GPUs and is a rendering technique applied by GPUs to generate images of 3D objects (it can also be easily run in parallel).

To summarize, the advantages of rasterization are as follows:

Converting geometries to triangles makes the process simpler. If all primitives are converted to triangle primitives, we can write fast and efficient functions that project triangles onto the screen and check if pixels lie inside these 2D triangles
Rasterization is object-centric. We project the geometry onto the screen and determine their visibility by looping through all pixels in the image.
Rasterization relies primarily on two techniques: projecting vertices onto the screen and finding out whether a given pixel lies within a 2D triangle.
The rendering pipeline running on the GPU is based on a rasterization algorithm.

Fast rendering of 3D Z-buffered linearly interpolated polygons is a fundamental problem on state-of-the-art workstations. Generally speaking, the problem consists of two parts: 1) 3D transformation, projection and lighting calculation of the vertices, 2) rasterizing the polygons into the framebuffer. — Parallel Algorithms for Polygon Rasterization, Juan Pineda, 1988

The term "rasterization" comes from the fact that polygons (in this case triangles) are broken down into pixels in some way, and as we all know, an image composed of pixels is called a raster image. Technically, this process is called rasterizing a triangle into an image for the framebuffer.

Rasterization is the process of determining which pixels lie inside a triangle, nothing more. — Rasterization by Larrabee, Michael Abrash

Hopefully by this point in the course, you've seen how to use rasterization methods to generate a 3D scene image consisting of triangles. Of course, what we have described so far is the simplest form of the algorithm. First, it could be greatly optimized, but additionally, we haven't explained what happens when two triangles projected onto the screen overlap the same pixels in the image. When this happens, how do we define which of the two (or more) triangles is visible to the camera? Let’s answer these two questions now.

3. Optimization: 2D triangle bounding box

Figure 3: To avoid iterating over all pixels in the image, we can iterate over all pixels contained in the 2D triangle bounding box

The problem with the simple implementation of the rasterization algorithm we've given so far is that it requires iterating over all the pixels in the image in an inner loop, even though a triangle might contain a small fraction of those pixels (as shown). shown in Figure 3). Of course, this depends on the size of the triangle on the screen. But considering that we are not interested in rendering one triangle, but rather an object that may be composed of hundreds to millions of triangles, it is unlikely that these triangles will be very large in the image in a typical production example.

Figure 4: After calculating the bounding box around the triangle, we can loop through all the pixels contained in the bounding box and test whether they overlap with the 2D triangle.

There are several ways to minimize the number of test pixels, but the most common method involves computing the 2D bounding box of the projected triangle and iterating over the pixels contained in that 2D bounding box rather than over the pixels of the entire image. While some of these pixels may still be outside the triangle, at least on average it can already significantly improve the performance of the algorithm. This idea is illustrated in Figure 3.

Computing the 2D bounding box of a triangle is very simple. We just need to find the minimum and maximum x-coordinates and y-coordinates of the three vertices that make up the triangle in raster space. The following pseudocode illustrates this:

// convert the vertices of the current triangle to raster space
Vec2f bbmin = INFINITY, bbmax = -INFINITY; 
Vec2f vproj[3]; 
for (int i = 0; i < 3; ++i) { 
    vproj[i] = projectAndConvertToNDC(triangle[i].v[i]); 
    // coordinates are in raster space but still floats not integers
    vproj[i].x *= imageWidth; 
    vproj[i].y *= imageHeight; 
    if (vproj[i].x < bbmin.x) bbmin.x = vproj[i].x); 
    if (vproj[i].y < bbmin.y) bbmin.y = vproj[i].y); 
    if (vproj[i].x > bbmax.x) bbmax.x = vproj[i].x); 
    if (vproj[i].y > bbmax.y) bbmax.y = vproj[i].y); 
}

Once we have calculated the 2D bounding box of a triangle (in raster space), it is just a matter of looping over the pixels defined by that box. But you need to be very careful with the way you convert the raster coordinates, in our code the raster coordinates are defined as floats and not integers.

First, note that one or two vertices may be projected outside the canvas bounds. Therefore, their raster coordinates may be less than 0 or greater than the image size. We solve this problem by limiting the pixel coordinates of the x-coordinate to [0, Image Width - 1] and the y-coordinate to [0, Image Height - 1]. Additionally, we need to round the minimum and maximum coordinates of the bounding box to the nearest integer value (note that this works fine when we iterate over the pixels in the loop because we initialize the variable to xmim or ymin and start with the variable Loop when x or y is less than or equal to xmax or ymax). All of these tests need to be applied before the final fixed-point (or integer) bounding box coordinates are used in the loop. This is pseudocode:

... 
uint xmin = std::max(0, std:min(imageWidth - 1, std::floor(min.x))); 
uint ymin = std::max(0, std:min(imageHeight - 1, std::floor(min.y))); 
uint xmax = std::max(0, std:min(imageWidth - 1, std::floor(max.x))); 
uint ymax = std::max(0, std:min(imageHeight - 1, std::floor(max.y))); 
for (y = ymin; y <= ymin; ++y) { 
    for (x = xmin; x <= xmax; ++x) { 
        // check of if current pixel lies in triangle
        if (pixelContainedIn2DTriangle(v0, v1, v2, x, y)) { 
            image(x,y) = triangle[i].color; 
        } 
    } 
}

4. Image or frame buffer

Our goal is to generate scene images. We have two ways to visualize the results of the program, either display the rendered image directly on the screen or save the image to disk and preview the image using a program such as Photoshop. But in both cases we need to store the image being rendered while rendering and for this we use what is called in CG an image or framebuffer. It's nothing more than a 2D color array with the size of an image. Before the rendering process begins, a framebuffer is created and the pixels are all set to black. At render time, when a triangle is rasterized, if a given pixel overlaps a given triangle, we store the color of that triangle in the framebuffer at that pixel location. When all triangles have been rasterized, the framebuffer will contain an image of the scene. All that's left to do is display the contents of the buffer on the screen or save its contents to a file. In this lesson we will choose the latter option.

5. Depth buffer (or Z buffer)

Remember, the goal of the rasterization algorithm is to solve visibility problems. To display a 3D object, you must determine which surfaces are visible. In the early days of computer graphics, two methods were used to solve the "hidden surface" problem (another name for the visibility problem): Newell's algorithm and z-buffers. For historical reasons, we only mention Newell's algorithm, but we will not study it in this lesson because it is no longer used. We will only look at the z-buffer method used by GPUs.

Figure 5: When a pixel overlaps two triangles, we set the pixel color to the color of the triangle with the smallest distance from the camera

There is one last thing we need to do in order for the basic rasterizer to work properly. We need to consider the fact that multiple triangles may overlap the same pixel in the image (as shown in Figure 5). When this happens, how do we decide which triangle is visible? The solution to this problem is very simple. We'll be using what we call a z-buffer, also known as a depth buffer, two terms you've probably heard or read about a lot.

The z-buffer is nothing more than another 2D array with the same dimensions as the image, but instead of an array of colors, it's an array of floats. Before we start rendering the image, we initialize each pixel in this array to a very large number. When a pixel overlaps a triangle, we also read the value stored in the z-buffer at that pixel location. As you might guess, this array is used to store the distance from the camera to the nearest triangle that overlaps any pixel in the image. Since this value is initially set to infinity (or any very large number), then of course, when we first find a given pixel X overlapping triangle T1, the distance from the camera to that triangle must be lower than what is stored in the z buffer value in the area. Then all we have to do is replace the value stored for that pixel with the distance to T1. Next, when testing the same pixel distance of a triangle T1). If the distance to the second triangle is less than the distance to the first triangle, then T2 is visible and T1 is hidden by T2. Otherwise, T1 is hidden by T2, while T2 is visible. In the first case, we update the value in the z buffer with the distance to T2, in the second case, there is no need to update the z buffer because the first triangle T1 is still the closest triangle we found pixels so far. As you can see, the z-buffer is used to store the distance of each pixel in the scene to the nearest object (we don't really use that distance, but we'll provide details further in the course).

In Figure 5 we can see that the red triangle is behind the green triangle in 3D space. If we render the red triangle first, and then the green triangle, for a pixel that overlaps both triangles, we must first store a very large number in the z-buffer at that pixel position (when the z-buffer is initialized), and then The distance to the red triangle and finally the distance to the green triangle.

You may be wondering how we find the distance from the camera to the triangle. Let's first look at the pseudocode implementation of the algorithm, we will return to this later (for now we assume that the function PixelContainedIn2DTriangle calculates this distance for us):

// A z-buffer is just an 2D array of floats
float buffer = new float [imageWidth * imageHeight]; 
// initialize the distance for each pixel to a very large number
for (uint32_t i = 0; i < imageWidth * imageHeight; ++i) 
    buffer[i] = INFINITY; 
 
for (each triangle in scene) { 
    // project vertices
    ... 
    // compute bbox of the projected triangle
    ... 
    for (y = ymin; y <= ymin; ++y) { 
        for (x = xmin; x <= xmax; ++x) { 
            // check of if current pixel lies in triangle
            float z;  //distance from the camera to the triangle 
            if (pixelContainedIn2DTriangle(v0, v1, v2, x, y, z)) { 
                // If the distance to that triangle is lower than the distance stored in the
                // z-buffer, update the z-buffer and update the image at pixel location (x,y)
                // with the color of that triangle
                if (z < zbuffer(x,y)) { 
                    zbuffer(x,y) = z; 
                    image(x,y) = triangle[i].color; 
                } 
            } 
        } 
    } 
}

6. What’s next?

This article is just a very high-level description of the rasterization algorithm (Figure 6), but it should give you an idea of what we need in the program to generate the image. We will need:

image buffer (2D color array),
depth buffer (2D float array),
triangles (the geometry that makes up the scene),
A function that projects the vertices of a triangle onto the canvas,
A function that rasterizes a projected triangle,
Some code saves the contents of the image buffer to disk.

Figure 6: Schematic diagram of rasterization algorithm

In the next chapter we will see how coordinates are converted from camera to raster space. The method is of course the same one we learned and introduced in the previous lesson, however, we will cover a few more tricks along the way. In Chapter 3, we'll learn how to rasterize triangles. In Chapter 4, we will look in detail at how the z-buffer algorithm works.

Original link:Overview of Rasterization Algorithm - BimAnt