Unity Shader Learning-1. Rendering Pipeline

First understand what is an assembly line: (The following is the definition of Baidu Encyclopedia)

Assembly line: Assembly line, also known as assembly line, is an industrial production method, which means that each production unit only focuses on processing a certain segment of work to improve work efficiency and output.

1. What is the rendering pipeline

1. Concept

The concept of the rendering pipeline is also consistent with the concept of the pipeline mentioned above. Its main task is to input a three-dimensional scene and then output a two-dimensional image. This process is done by CPUand GPUtogether.

2. The three stages of the rendering pipeline

The rendering pipeline can be divided into three stages: 应用阶段, 几何阶段,光栅化阶段

a. Application phase (CPU processing)

This stage is led by the developer, and there are 3 main tasks in this stage:
- First, you need to prepare the scene data (camera position, frustum, model, light source, etc.)
- Then, also Coarse-grained culling needs to be done
- finally, the rendering state of each model needs to be set (material used, texture used, shader used, etc.)

The most important output of this stage is the geometric information required for rendering, that is, rendering primitives, which can be points, lines, triangles, etc.

b. Geometry stage (GPU processing)

The geometry stage is mainly used to handle all things related to the geometry we draw. The geometry stage is responsible for dealing with each rendering primitive and performing 逐顶点operations 逐多边形. This stage can be further divided into smaller pipeline stages.
An important task of the geometry stage is to transform vertex coordinates into screen space and then hand them over to the rasterizer for processing.
Summary: Input rendering primitives -> two-dimensional vertex coordinates in screen space, depth, coloring and other information corresponding to each vertex

c. Rasterization stage (GPU processing)

Will use the data passed in the previous stage to generate on-screen pixels and render them out 最终的图像. The main task is to decide which pixels in each render primitive should be drawn on the screen.

3. Communication between CPU and GPU

The starting point of the rendering pipeline is the CPU, the application stage. The application phase can be divided into the following three phases:

  • 1. Load the data into the video memory
  • 2. Set the rendering state
  • 3. Call Draw Call

1. Load data into video memory

基本步骤就是纹理、网格等数据从硬盘加载到系统内存再加载到显存。数据加载到显存后系统内存中的数据就可以被移除了,但是对于一些数据来说CPU需要访问他们,例如用于碰撞检测用的网格数据,这些数据则会被保留。

2. Set the rendering state

渲染状态指的是场景中的网格是如何被渲染的,例如使用哪个Vertex Shader或者哪个Fragment Shader、光源属性、材质等。

3. Call Draw Call

`Draw Call`指的是一个命令,发起方为CPU,接收方为GPU。当给定了一个Draw Call时,GPU会根据渲染状态(例如材质、纹理、着色器等)和所有输入的顶点数据进行`计算`,最终输出成在屏幕上的像素。这个计算的过程就是GPU流水线。

4. GPU pipeline

When the GPU receives the Draw Call instruction sent by the CPU, it will perform a series of pipeline operations, and finally render the primitives to the screen. The carrier for 几何阶段and 光栅化阶段its implementation is GPU, and developers cannot fully control the implementation details of these two stages. The GPU pipeline can be subdivided into different pipeline stages

Alt text

(1) Vertex shader:

As can be seen from the figure, the GPU pipeline receives vertex data as input, which is then passed to the vertex shader. The processing unit of a vertex shader is that the 顶点vertex shader is called once for each vertex that comes in. (The vertex shader itself cannot create or destroy any vertices, and cannot get the relationship between vertices)

The vertex shader is fully programmable, and its main tasks are: coordinate transformation and per-vertex lighting.
- Coordinate transformation: It is to perform some transformation on the coordinates of the vertex - convert the vertex coordinates from the model space to the homogeneous clipping space. We can simulate water surface, fabric, etc. through coordinate transformation.

(2) Tessellation shader:

is an optional shader mainly used for subdivision primitives.

(3) Geometry shader:

is an optional shader that can be used to perform per-primitive shading operations, or be used to generate more primitives.

(4) Cutting:

This stage is configurable. The purpose is to clip out those vertices that are not in view, and to cull the patches of certain triangle primitives.
There are three relationships between a primitive and the camera's field of view: completely within the field of view, partially within the field of view, and completely outside the field of view.
- Primitives that are completely in view will continue to the next pipeline stage
- Primitives that are completely out of view will not be passed down
- Primitives that are partially in view need to be 裁剪processed

Unlike vertex shaders, this step is yes 不可编程. We cannot programmatically control the process of cropping, but a fixed operation on the hardware.

(5) Screen mapping:

This stage is configurable and programmable, and is responsible for converting the coordinates of each primitive (3D coordinate system) into screen coordinates (2D coordinate system)
. The coordinates entered in this step are still the coordinates in the three-dimensional coordinate system. The screen coordinates obtained by the screen mapping determine which pixel on the screen this vertex corresponds to and the distance from this pixel.
Note: OpenGL regards the lower left corner of the screen as the minimum window coordinate value, while DirectX defines the upper left corner of the screen as the minimum window coordinate value. (Can't we unify it?)
Alt text
Such differences will cause a lot of pitfalls for developers. If you find that the images obtained during the development process are reversed, it may be because of this problem.

(6) Triangle settings: (begin to enter the rasterization stage)

Chinese name rasterization
Definition : Convert the image into an image composed of a grid
process: Convert the vertex data to the fragment process
Features : Each element corresponds to a pixel in the frame buffer

The data we can get in the previous stage is the vertex position in the screen coordinate system and other information related to them, such as depth value, normal direction, viewing angle direction, etc.
Goals of the rasterization stage: 1. Calculate which pixels are covered by each primitive 2. Calculate their colors for these pixels

Triangle setup is the first stage of rasterization, which calculates the information needed to rasterize a triangle mesh. The output of the previous stage is the vertices of the triangle mesh. If we want to get the coverage of the entire triangle mesh, we must calculate the pixel coordinates on each edge to get the representation of the triangle boundary. Such a process of obtaining a representation of the boundaries of a triangle is triangle setup.

(7) Triangle traversal:

This stage checks whether each pixel is covered by a triangle style. If covered, one is generated 片元, so a process of finding which pixels are covered by a triangle is a triangle traversal.

The representation of the triangle mesh obtained in the previous stage is used at this stage to determine which pixels are covered by a triangle mesh, and use the 3 vertex information of the triangle mesh to interpolate the pixels of the entire coverage area. The output of this stage is a sequence of fragments.
Note: A fragment is not really a pixel, but a collection of states that are used to calculate the final color of each pixel. These states include screen coordinates, depth information, and vertex information output from the geometry stage, such as normals and texture coordinates.

(8) Fragment shader:

The input of the fragment shader is the result obtained by interpolating the vertex information in the previous stage, more specifically, it is obtained by interpolating the data output from the vertex shader. The output of this stage is one or more color values. Many important rendering techniques can be done at this stage, such as texture sampling, but its limitation is that it can only affect a single fragment.

(9) Fragment-by-fragment operation: (the last step of the rendering pipeline)

The purpose of this stage is: Merge . So what data is merged?
Several main tasks of this exception:
- Determine the visibility of each fragment. This involves depth testing, stencil testing, etc.
- if a fragment passes all the tests, then the fragment's color value is merged, or mixed, with the color already stored in the color buffer.

This stage is programmable and we can set the details of each step.

Visibility:

At this stage, the color first needs to solve the visibility problem of each fragment. Each fragment needs to undergo the following tests. If it fails the test at any stage in the middle, it will be discarded.
Alt text


Stencil test:
If the stencil test is enabled, the GPU will first read the stencil value of the fragment position in the stencil buffer, and then compare the value with the read reference value (which can be specified by the developer), and the developer can Set to discard when it is less than or discard the fragment when it is greater than or equal to. Stencil testing is usually used to limit the area of ​​rendering, and there are other advanced uses of stencil testing, such as rendering shadows, contour rendering.

Depth test:
If a fragment is lucky enough to pass the stencil test, then the depth test will be performed. If the depth depth is enabled, the GPU will compare the depth value of the fragment with the depth value that already exists in the depth buffer. This comparison function is also set by the developer. You can choose to discard when it is greater than this value or discard when it is less than or equal to this value. But usually this comparison function is less than or equal to the relationship, because we always want to display only the objects closest to the camera, and those fragments that are occluded by other objects do not need to appear on the screen. Unlike stencil testing, if a fragment fails the depth test, it has no right to modify the value in the depth buffer. And if it passes the test, the developer can decide whether to use the depth value of this fragment to overwrite the value in the buffer by turning on/off depth writing.

Blending:
For opaque objects, the developer can turn off the blending operation, so that the color value calculated by the fragment shader will directly overwrite the pixel value in the color buffer. But for translucent objects, we need to turn on the blending operation to make the object look translucent. Blending operations are highly configurable, and developers can choose to turn blending on/off. If enabled, the GPU will take the source color and the destination color and mix the two colors. The source color is the color value obtained by the fragment shader, and the destination color is the color value that already exists in the color buffer.

Reasons why transparency testing can cause performance degradation: If we do a transparency test in a fragment shader, and the fragment fails the transparency test, we call an API (such as a clip operation) in the shader to manually discard it . This results in the GPU not being able to perform various tests ahead of time. So modern GPUs will determine if operations in the fragment shader conflict with early testing. If there is a conflict, early testing is disabled. But this will cause more fragments to be processed, because the performance will be degraded

2. Keyword Q&A

1. OpenGL and DirectX

It is very troublesome for developers to directly access the GPU, and may need to deal with various registers and video memory, and the graphics programming interface implements a layer of abstraction on the basis of these hardware.
And OpenGL and DirectX are these graphics application programming interfaces, and the grievances between them can be found in this article . These interfaces set up a communication bridge between the upper-layer application and the underlying GPU. The upper-layer application renders commands to these interfaces, and these interfaces in turn send rendering commands to the display driver, and the graphics card driver translates these commands into a language that the GPU can understand to allow them to work.

2.HLSL、GLSL和CG

All three refer to the programming language of the shader.
- HLSL: High Level Shading Language, the shader language of DirectX. Microsoft controls the compilation of shaders. Even if different hardware is used, the compilation result is the same. The platforms used are relatively limited, and almost all of them are Microsoft's own products. , such as Windows, Xbox 360, etc.

  • GLSL: OpenGL Shading Language, OpenGL's shader language, has the advantage of being cross-platform and can be used on Windows, Mac, Linux and even mobile platforms. This cross-platform is because OpenGL does not provide a shader compiler, but is provided by the graphics card. The driver to complete the compilation of the shader. That is, it will work as long as the display driver supports compiling for GLSL.

  • CG: C for Graphics, NVIDIA's shader language, implements a true cross-platform, it will be compiled into the corresponding intermediate language according to the platform.

3.Draw Call

The meaning of Draw Call itself is very simple, that is, the CPU calls the image programming interface.

1. How do CPU and GPU work in parallel?

The main solution is the command buffer . The command buffer contains a command queue. The CPU adds commands to it, and the GPU reads commands from it. The process of adding and reading is independent. This allows the CPU and GPU to work independently of each other. When the CPU needs to render an object, it adds commands to the command buffer, and when the GPU finishes the last rendering task, it can take a command from the command queue and execute it.

2. Why does too many Draw Calls affect the frame rate?

Between each draw call, the CPU needs to send a lot of things to the GPU, including data, status, and commands. The CPU needs to do a lot of work, like checking the rendering state, etc. Once the CPU has completed these preparations, the GPU can start rendering this time. The GPU rendering speed is much faster than the speed at which the CPU submits instructions. Therefore, the performance bottleneck will appear on the CPU. If the number of Draw Calls is too large, the CPU will spend a lot of time submitting Draw Calls, causing the CPU to be overloaded.

3. How to reduce Draw Call?

The main solution is batch processing (Batch) , which combines many small merged Draw Calls into one Draw Call. Of course, not all cases can be merged. We can merge meshes, but the process of merging is time-consuming, so batch processing is more suitable for static meshes.
Merging points to note:

  • Avoid using a lot of very small meshes, and when it is unavoidable to use such small meshes, consider whether they can be merged.
  • Avoid using too many materials, because the same material will make it easier for us to merge

4. What is fixed-function pipeline?

Fixed pipeline for short, usually refers to the rendering pipeline implemented on older GPUs. The developer does not have full control over the pipeline, only some configuration operations, which are only on and off

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325791411&siteId=291194637
Recommended