Essential Study Notes for Getting Started with Unity Shader - Chapter 2 Rendering Pipeline

Essential Study Notes for Getting Started with Unity Shader - Chapter 2 Rendering Pipeline

This series is a summary of the essential reading notes for getting started with UnityShader, the
original author's blog link: http://blog.csdn.net/candycat1992/article/Book
link: http://product.dangdang.com/23972910.html

Chapter 2 The Rendering Pipeline

2.1 Overview

The ultimate purpose of the rendering pipeline is to generate or render a two-dimensional texture, that is, all the effects we see on the computer screen. Its input is a virtual camera, some light sources, some Shaders, and textures.
The rendering process is divided into three stages: application stage, geometry stage, and rasterization stage.
write picture description here

Application stage:
This stage is dominated by developers and is usually implemented by the CPU.
There are three main tasks in this stage:
first, prepare the scene data, such as the position of the camera, the model in the scene, the use of light sources, etc.;
second, the coarse-grained culling work, culling invisible objects, and omitting the subsequent geometry stage for processing;
finally, Set the rendering state of each model. These render states contain materials, textures, shaders, etc.

The output is the geometric information required for rendering, that is, the rendering primitives, which will be passed to the next stage: the geometry stage.

Geometry stage: (corresponding vertex function) The
geometry stage is usually performed on the GPU and is responsible for dealing with each rendering primitive, performing vertex-by-vertex and polygon-by-polygon operations.
An important task of the geometry stage is to transform vertex coordinates into screen space and then hand them over to the rasterizer for processing. This stage will output the two-dimensional vertex coordinates
in screen space , the depth value corresponding to each vertex , coloring and other related information, and pass it to the next stage.

Rasterization stage: (corresponding fragment function)
This stage will use the data passed in the previous stage to generate pixels on the screen and render the final image.
This stage also runs on the GPU.
The main task of rasterization is to decide which pixels in each rendering primitive should be drawn on the screen. It needs to interpolate the vertex-by-vertex data obtained in the previous stage, and then do pixel-by-pixel processing.

2.2 Communication between CPU and GPU

The starting point of the rendering pipeline is the CPU, the application stage. It is roughly divided into three stages:
(1) Load the data into the video memory
(2) Set the rendering state
(3) Call the Draw Call

Loading data into video memory:
All data required for rendering needs to be loaded from hard disk into system memory. Then, data such as meshes and textures are loaded into the graphics card's storage space - video memory. The data that needs to be loaded into video memory in real rendering is often much more complicated than shown in the figure. For example,
vertex position information, normal direction, vertex color, texture coordinates, etc.
write picture description here
Setting Rendering States:
Rendering states, in layman's terms, define how the meshes in the scene are rendered.
For example, which shader to use, light properties, materials, etc.
If we don't change the render state, then all meshes will use the same render state, as shown in the image below showing the contents of 3 different meshes when the same render state is used.
write picture description here
After the above work is ready, the CPU will call the rendering command Draw Call to tell the GPU to start rendering.

Calling Draw Call:
Draw Call is actually a command, its initiator is the CPU, and the receiver is the GPU.
This command will just point to a list of primitives that need to be rendered, and will not contain any material information - this is because we did it in the previous stage.
When a Draw Call is given, the GPU will perform calculations based on the rendering state and all incoming vertex data, and finally output those beautiful pixels displayed on the screen. And this process is the GPU pipeline.
write picture description here

2.3 GPU pipeline

In the geometry stage and rasterization stage, developers cannot have absolute control. The carrier of its realization is GPU, which greatly speeds up the rendering speed through pipelined GPU.
The geometry stage and rasterization extreme can be divided into several smaller pipeline stages.
write picture description here

Homophonic memory top song several screen cut three three pieces screen by screen

Green: Fully programmable
Yellow : Configurable but not programmable
Blue: Pipeline fixed implementation

Solid boxes: developers must practice programming
Dotted boxes: optional

Geometry stage:
Vertext Shader: Fully programmable, it is usually used to implement vertex space transformation, vertex coloring and other functions.
Tessellation Shader: An optional shader that subdivides primitives.
Geometry Shader: An optional shader that performs per-primitive shading operations, or produces more primitives.
Clipping: Clips vertices that are not in the camera's field of view, and culls some triangular primitives. This stage is configurable.
Screen Mapping: This stage is not configurable and programmable, it is responsible for transforming the coordinates of each primitive into the screen coordinate system.

Rasterization stage: The
triangle setup and triangle traversal stages are also fixed function stages.
Fragment shader (emphasis): Fully programmable, it is used to implement fragment-by-fragment shading operations.
The fragment-by-fragment operation stage is responsible for performing many important operations, such as modifying color, depth buffering, blending, etc. It is not programmable, but it is highly configurable.

Vertex Shader: The
vertex shader is the first stage of the pipeline and its input comes from the CPU. The processing unit of a vertex shader is a vertex, that is, the vertex shader is called once for each vertex that comes in.

The work that the vertex shader needs to complete mainly includes: coordinate transformation and per-vertex lighting

Coordinate transformation: Perform some transformation on the coordinates of the vertices. One of the most basic tasks that a vertex shader must do is to convert vertex coordinates from model space to homogeneous clip space. We often see code like this in vertex shaders:

o.pos = mul(UNITY_MVP, v.position);  

Crop:
Crop things that are outside the camera's field of view, and parts of primitives that are outside the camera's field of view.
write picture description here
Screen mapping:
Convert the x and y coordinates of each primitive to the screen coordinate system. The screen coordinate system is a two-dimensional coordinate system, and screen mapping will not perform any processing on the input coordinates. In fact, the screen coordinate system and the z coordinate together form a coordinate system called the window coordinate system.
write picture description here

Triangle setup:
This step begins the rasterization phase. The information output from the previous stage is the vertex position in the screen coordinate system and additional information related to them, such as depth value (z coordinate), normal direction, view direction, etc.

The rasterization stage has two most important goals: calculating which pixels are covered by each primitive, and calculating their colors for those pixels.

Triangle Setup This stage computes the information needed to rasterize a triangular mesh. Such a process of computing the data represented by a triangular mesh is called triangulation. Its output is to prepare for the next stage.

Triangle Traversal:
This phase will check if each pixel is covered by a triangular mesh. If overwritten, a fragment will be generated. Such a process of finding which pixels are covered by a triangular mesh is triangle traversal, a phase also known as scan transformation.

The triangle traversal stage determines which pixels are covered by a triangular mesh according to the calculation results of the previous stage, and uses the vertex information of the three vertices of the triangular mesh to interpolate the pixels of the entire coverage area. As shown below:
write picture description here

Fragment Shader:
It is a very important programmable shader stage.
The rasterization stage doesn't actually affect the color value of each pixel on the screen, but instead generates a series of data messages that describe how a triangular mesh covers each pixel. And each fragment is responsible for storing such a series of data.

The stage that really affects the pixels is the next stage of the pipeline - the fragment-by-fragment operation.

The input to the fragment shader is the result of interpolating the vertex information in the previous stage, more specifically, it is obtained by interpolating the data output from the vertex shader. And its output is one or more color values. As shown below
write picture description here

Fragment-by-fragment operations:

There are several main tasks in this phase.
1) Determine the visibility of each fragment. This designs a lot of testing work, such as depth testing, stencil testing, etc.
2) If a fragment passes all the tests, the color value of the fragment needs to be merged with the color already stored in the color buffer.

It should be pointed out that the fragment-by-fragment operation stage is highly configurable, that is, we can set the operation details of each step.
This stage first addresses the visibility of each fragment. This requires a series of tests.
write picture description here
write picture description here

Look at template testing first. Related to this is stencil buffering. In fact, the stencil buffer and the color buffer we often hear, the depth buffer is almost a kind of thing. If the stencil test is enabled, the GPU will first read the stencil value of the fragment in the stencil buffer, and then compare the value with the read reference value. This comparison function can be specified by the developer, such as less than discards the fragment when it is greater than or equal to, or discards the fragment when it is greater than or equal to. If the fragment fails this test, the fragment is discarded.
Regardless of whether a fragment passes the stencil test or not, we can modify the stencil buffer according to the stencil test and the following depth test results. This modification operation is also specified by the developer. Developers can set modification operations under different results, for example, the stencil buffer remains unchanged when it fails, and the value of the corresponding position in the stencil buffer is incremented by 1 when it passes. Stencil tests are often used to limit the rendered area. In addition, there are some more advanced uses of stencil testing, such as rendering shadows, contour rendering, etc.

If a fragment passes the stencil test, it enters the depth test. This test is also highly configurable. If depth testing is enabled, the GPU will compare the fragment's depth value with the depth value already present in the depth buffer. This comparison function can also be set by the developer, such as discarding the fragment when it is less than, or discarding the fragment when it is greater or equal. Usually this comparison function is less than or equal to the relationship , that is, if the depth value of this fragment is greater than or equal to the value in the current depth buffer, it will be discarded. This is because we always want to show only the objects closest to the camera, and those occluded by other objects don't need to appear on the screen. The closer you are to the camera, the smaller the depth, just like looking from a wellhead to the bottom of the well. If the fragment fails the test, the fragment is discarded. Unlike stencil testing, if a fragment fails the depth test, it has no right to change the depth buffer value. And if it passes the test, the developer can also specify whether to overwrite the original depth value with the depth value of this fragment, which is done by turning on/off depth writing. We will find in our later studies that the transparency effect is very closely related to depth testing and depth writing.

After the fragment passes the above two tests, it is merged.
Why do you need to merge? We need to know that the rendering process we are talking about is one object after another object is drawn to the screen. And the color information of each pixel is stored in a place called color buffer. Therefore, when we perform this rendering, the color buffer usually already has the color results after the last rendering. So, do we use the colors obtained in this rendering to completely overwrite the previous results, or perform other processing? That's what merging needs to deal with.
For opaque objects, the developer can turn off the blending operation. In this way, the color value calculated by the fragment shader will directly overwrite the pixel value in the color buffer. But for translucent objects, we need to use the blending operation to make the object appear transparent. The figure below shows a simplified flow chart of the blending operation.
write picture description here

From the flowchart we can see that the blending operation is also highly configurable: the developer can choose to turn on/off the blending function. If the blending function is not enabled, the color in the color buffer will be directly overwritten with the color of the fragment, and this is also the reason why many beginners find that they cannot get the transparency effect (the blending function is not enabled). If blending is turned on, the GPU takes the source and destination colors and mixes the two colors.
The source color refers to the color value obtained by the fragment shader, while the destination color refers to the color value already in the color buffer.
After that, a blending function is used to perform the blending operation. This blending function is usually closely related to the transparency channel, such as adding, subtracting, multiplying and so on according to the value of the transparency channel. Blending is very similar to the operation of layers in Photoshop: each layer can choose a blending mode, the blending mode determines the blending effect of the layer and the lower layer, and the picture we see is the blended picture.

The order of tests given above is not unique, and while logically these tests are performed after the fragment shader, for most GPUs they are performed before the fragment shader if possible these tests. This is understandable, imagine that when the GPU finally calculates the color of the fragment after spending a lot of effort in the fragment shader stage, only to find that the fragment does not pass these checks at all, which means that the fragment is still discarded. Now, all the computing costs that were spent before are wasted!
write picture description here

The graph scene contains two objects, a ball and a cuboid. The drawing order is to first draw the ball (displayed as a circle on the screen), and then draw the cuboid (displayed as a rectangle on the screen). If the depth test is performed after the fragment shader, when rendering the box, although most of its area is occluded behind the ball, that is, most of the fragments it covers cannot pass the depth test at all, but we still need to These fragments execute fragment shaders, resulting in a large performance waste.
When the primitive structure of the model has been calculated and tested layer by layer, it will be displayed on our screen. What our screen displays is the color value in the color buffer. However, to avoid us seeing primitives that are being rasterized, the GPU uses a double buffering strategy. This means that the rendering of the scene happens behind the scenes, i.e. in the back buffer. Once the scene has been rendered into the back buffer, the GPU swaps the contents of the back buffer and the front buffer, which is the image previously displayed on the screen, thus ensuring that we see The image is always continuous.
In fact, the real implementation process is far more complicated than mentioned above. It should be noted that readers may find that the pipeline names and order given here may be different from those seen in some sources. One reason is that the implementation of graphics programming interfaces (such as OpenGL and DirectX) is not the same, another reason is that the GPU may do a lot of optimizations at the low level such as the above-mentioned depth test before the fragment shader.
Although the rendering pipeline is more complex, Unity, as a very good platform, encapsulates a lot of functions for us. More often, we only need to set some inputs in a Unity Shader, write vertex shaders and fragment shaders, and set some states to achieve most common screen effects.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325445456&siteId=291194637