foreword

I learned about rasterization for a better understanding over the weekend. I learned from the bottom, and I will take it easy when I encounter problems, and I will share this knowledge with everyone at the same time. Here is a conceptual foreshadowing when 3D is about to be rendered to our screen. The next thing we have to do is to draw this standard cube to the screen so that it can finally be seen by us.

Let's just look at this picture:

transformation process

And where the rasterization process happens, in fact, the object is transformed by the MVP, and the space observed by the camera is compressed into a standard cube. These processes then draw a standard cube [-, 1,] to the screen

convert

Before doing this step, we first need to know the following definitions:

Screen

In graphics, we abstract the screen as a two-dimensional array, and each element in the array is called a pixel (abbreviation for picture element). The size of this array is the resolution of the screen (Resolution). For example, we often say that the screen resolution is 1920*1080, which means that there are so many pixels. The screen is a typical raster imaging device (Raster display).

grating device

Oscilloscope (Oscilloscope), Cathode Ray Tube (Cathode Ray Tube, early display), Liquid Crystal Display (LCD, using the wave nature of light, the distortion of liquid crystal), Light Emitting Diode (LED), ink Screen (Electrophoretic Display, low refresh rate).

equipment

Raster and Rasterization

Raster means screen in German, and rasterization is the process of drawing things on the screen. This process is very complicated, and it is also the focus of this article, which will be introduced later.

pixel

Pixel, the smallest unit, we can understand it as a small square, the color in the pixel can be defined by rgba, there is only one color in a pixel (small square).

Of course, with the development of technology, the content and distribution of pixels are becoming more and more complex. For example, the following picture shows the pixel distribution of two mobile phone screens:

image-20220116153836643

In the iPhone screen on the left, a pixel has three colors.

右图三星手机的这种分布我们称之为Bayer pattern，从中可以发现绿色的点比红蓝更多，这是因为人眼对绿色最敏感，人眼看上去可以更舒服。

在文本中依旧认为每个像素中只存在一种颜色。

定义屏幕空间

前面说了，我们要把标准立方体绘制到屏幕上，那么首先我们要定义一下屏幕空间，然后把标准立方体变换到这个空间上。

定义屏幕空间相当于在屏幕上建立一个坐标系，这里我们以屏幕的左下角为原点，向右为x轴方向，向上是y轴方向（注，定义的方法有很多种，例如我们也可以左上角为原点，在后续的操作中遵循自己的定义即可）。

screen

前面说到屏幕是有一个个像素所组成的，例如我们像素的二维数组为 wh，那么就表示在x轴方向有w列，在y轴方向有h行。这里我们设每个像素的大小为11，那么整个屏幕的大小即为 w*h。如上图，一个个小方块即代表一个像素。

这样我们就可以通过坐标的方式来定义每个像素的位置了，即(x, y)，可以当做是图中每个小方块左下角点的坐标。例如坐标(0, 0)就表示屏幕最左下角的那个像素，由于坐标从0开始，因此最右上角的那个像素坐标为(w-1, h-1)。

此外我们说过像素是一个个小方块，那么自然有它的中点（即图中小方块中间的点），因此像素(x, y)的中点即为(x+0.5, y+0.5)。

视口变换（Viewport Transform）

在前面我们定义好了屏幕空间，那么我们要把MVP变换后的标准立方体绘制到屏幕上，首先要做的就是把其变换到这个w*h的空间上，这个变换我们称之为视口变换。

该变换我们主要分为如下两步：

将x轴和y轴长度为2的标准立方体缩放为x轴和y轴长度分别为w和h长方体。
将该立方体从原点平移到 w /2 , h/ 2 中 注：此处我们先不考虑z轴的变换，后续会有它的作用。

这个变换矩阵很简单，就不过多推导了，其结果如下：

matrix

矩阵

光栅化

在上面视口变换后，我们的标准立方体虽然变成了一个xy方向和屏幕一样大的空间，但是空间中依旧还是我们的三维物体，例如人，建筑，植物等。前面我们知道屏幕是由一个个像素组成的，也就是说我们通过屏幕看见的二维画面其实都是由无数个像素构成的。因此接下来我们要做的就是把空间中的那些三维物体全部打散成像素，而这个过程，我们就可以称之为光栅化。

Mesh与三角形

在生活中我们知道，不管是相机拍照还是人眼看，我们仅仅只能看见物体的表面，因此我们要显示在屏幕上的，也仅仅是这些三维物体的表面。而对于表面，我们可以把它理解成由多个不同平面所组成，例如长方体即是六个长方形所组成的。在图形学中，我们需要把表面分解成无数个不同的小三角形（Triangle），这些三角形像网一样编织在一起，就可以形成任何我们想要的三维物体表面，这些由三角形所构成的表面我们称之为Mesh。例如长方体就是有12个三角形组成，因为一个长方形可以分解成两个三角形。更复杂的物体表面分解可见下面几个示意图：

mesh

为什么选择三角形呢？因为它的优点如下：

三角形是最基础的多边形，任何其他不同的多边形都可以拆成若干个三角形。我们可以通过向量的叉积来判断一个点是在三角形内或者外，但是对于有凹凸的多边形不行。我们可以给定三个顶点不同的属性，在三角形内做出渐变效果，即可根据插值算出三角形内任意一点的属性。

单个三角形光栅化

通过上面的解释，我们又把问题进行了简单化，也就是把三维物体光栅化即是把无数个三角形进行光栅化。那么同样由繁化简，我们先来看看如何把一个xy平面上的三角形（忽视z轴）进行光栅化，如下图，背景中的黑色实线所围成的小格子即是我们的像素，灰色虚线为辅助线，方便看像素的中心点：

triangle

三角形

首先可以肯定的是，光栅化后，肯定不是像上图那样的显示了。因为前面我们说过一个像素中只会存在一个颜色，而上图明显不符合这个要求，例如我们看下标为(1, 2)的像素点，里面只有一部分是红色的。那么这个像素到底应该是没有颜色还是全部红色呢？在图形学中，我们定义若一个像素的中心点在三角形的内部，那么这个像素就属于该三角形。例如例子中下标为(2, 2)的像素点，我们可以从图中明确的看出其中心点在三角形内部，那么这个像素就应该全部显示红色。

当然了，我们肯定不可能通过肉眼来观察是否在三角形内部，因此光栅化过程中很重要的一步便是：判断像素的中心点与三角形的内外关系。这里也就体现了使用三角形的好处，因为前面我们说了使用叉积的方法可以判断点和三角形的内外关系（原理就不多赘述了，主要是通过叉乘去判断三个 z轴的方向，方向是否一致，来判断点是不是在三角形内部）。那么我们就可以定义一个函数用来判断，如下：

bool isInside(t, x, y){}
复制代码

函数体内即使用叉积来判断，若在三角形内则返回true，不在则返回false。输入的参数 t 代表三角形的信息集合（三个顶点的x，y信息），输入的参数 x 和 y 即点的位置信息。

知道了屏幕中任何一个点和三角形的关系后，我们遍历每个像素，取其中心点的x和y带入该函数中即可，而这步操作，我们称之为采样。

这张图可能看的更明显一点：

sampling

采样

采样（sampling）

何为采样？是指从总体中抽取个体或样品的过程。用程序的思维来解释的话，就是给定一个连续的函数，然后我们通过不同的输入来获取函数的值。例如有个函数，我们分别求 x = 1，x = 2，x = 3...时y的值。采样即把一个函数给离散化（Discretize）的过程，在图形学中有广泛的应用。

采样频率：采样频率可以理解为抽取样本的间隔，例如上面我们采样的是x = 1，x = 2，x = 3...，间隔为1。如果改成x = 1，x = 3，x = 5... ，间隔为2，那么就代表频率变慢。而改成x = 0.5，x = 1，x = 1.5... ，间隔为0.5，那么就代表频率变快。

应用到我们现在所说的屏幕中的话，我们前面已经把摄像机所观测的空间通过一系列变换，变成了在xy方向上和屏幕一样大的空间了。而这个空间内的可以想象成是由无数个连续的点所组成（忽略z）。而我们的屏幕又由一个个的像素组成，这些像素的中心点对应到空间中的点，就是我们从空间中所有点中所抽取出的样本。那么采样的间隔自然是我们像素的实际物理大小。因此若屏幕大小不变，分辨率越高（即像素越多，像素的中心点物理间隔越小），采样的频率越高。

上面介绍的属于采样二维空间中的位置信息，此外我们还可以采样时间，例如下图，便是采样了不同时间人挥球的动作，进行了显示

img

前面我们说了屏幕是由 width * height 个像素点组成的，因此要采样每个像素的中心点，只需要遍历所有像素，然后将其中心点的x和y带入上面定义好的函数中即可：

for(int x = 0; x < width; x++){
    for(int y = 0; y < height; y++){
        pixel[x][y] = isInside(t, x + 0.5, y + 0.5);//前面提到像素中心点是像素坐标x,y的值+0.5
    }
}
复制代码

这样我们就可以知道在三角形内的所有像素了，如下图，顶点标记黑色的即为在三角形内的顶点。

test

思考：如果我们修改下上面的代码，改成如下，那是做了什么事情？

for(int x = 0; x < width; x++){
    for(int y = 0; y < height; y++){
        pixel1[x][y] = isInside(t, x + 0.25, y + 0.25);
        pixel2[x][y] = isInside(t, x + 0.25, y + 0.75);
        pixel3[x][y] = isInside(t, x + 0.75, y + 0.25);
        pixel4[x][y] = isInside(t, x + 0.75, y + 0.75);
    }
}
复制代码

其实很简单，做的就不再是采样每个像素的中心点，而是讲一个像素分成了如下图的四块，然后采样每个像素这四块的中心点。

four

四个

Bounding Box

在上面的采样中，我们需要采样屏幕中所有的像素中心点，但是实际上我们的三角形可能非常的小，就占了几个像素，那么这种做法就会造成很大的性能消耗。因此我们可以使用Bounding Box来缩小我们的采样范围。例如下图，可能在三角形内的像素肯定是在蓝色区域的范围内，而这个蓝色区域我们就称之为Bounding Box。也可称之为轴向的包围盒，即Aixe align bounding box，也就是常说的AABB。

bounding box

包围盒

Therefore, given the three vertices of the triangle, we only need to find the maximum and minimum values of the three vertices on the x-axis and the maximum and minimum values on the y-axis to define a Bounding Box, and then we only need to define a Bounding Box in this Bounding Box can be sampled in.

for(int x = xmin; x < xmax; x++){
    for(int y = ymin; y < ymax; y++){
        pixel[x][y] = isInside(t, x + 0.5, y + 0.5);
    }
}
复制代码

In this way, we can reduce a lot of sampling, but there are some special cases, so that the triangle itself is still not big, but the Bounding Box is very large, such as the following picture:

special

For this situation, we can also do special processing, such as making a Bounding Box per line, and then traversing from left to right, as follows.

line by line boundingBox

As for how to judge the left and right boundaries of the Bounding Box of each line, and how to judge that this triangle belongs to this special case, we will add it in the follow-up study.

Through the above knowledge, we can find the pixels inside the triangle on the screen, and then we color these pixels, we can get the following results

image-20220116161928711

The above image is also the result of our single triangle rasterization, which is what it really looks like on the screen.

Obviously, this effect looks very different from the original triangle, and the edges of the triangle are uneven, which is the so-called sawtooth. I will continue to study and organize this later, so stay tuned

Finally, a brief summary is made for you as shown in the following picture:

Summarize

Rasterization of WEBGL