Gong Da video study notes: God's perspective on GPU (2) - logical module division

foreword

In the last issue, we introduced what kind of pipeline should be included in a basic QQ. With the development of the times, new requirements gradually emerged. Let's take a look at how to gradually expand from the basic graphics pipeline to what it is now.
insert image description here

insert image description here

1-geometry shader

The vertex shader and pixel shader mentioned above are both single-input and single-out structures, and they only accept one input unit. After processing, output one, but such a pipeline has a missing function, if the unit we want to process is not a vertex, nor a pixel.

If it is a primitive, then it cannot be done. This demand gave birth to a new shader called geometry shader, which is equivalent to disassembling the primitive assembler.
insert image description here

After the vertex shader outputs the processed vertices, the entire primitive is first sent to the geometry shader, and after processing, the rest of the primitive assembler is done.

insert image description here

Compared with the first two shaders, the geometry shader has a big feature of single input and multiple output.

insert image description here
insert image description here

A primitive entering the geometry shader can output multiple primitives. So we can move the whole triangle, or cut a triangle into several. Its existence allows the GPU to do more flexible tasks such as non-uniform output.

For example, the first triangle outputs one triangle, the second triangle outputs five, and the third triangle outputs three.
insert image description here

Another difference is that both the vertex shader and the pixel shader are required. If you don't specify it, you can't string the entire pipeline. The geometry shader is optional, if not specified, it means that it will be connected directly.

insert image description here

The primitive output by the geometry shader can not only enter the primitive assembler to complete the entire pipeline.

It is also possible to directly output data to memory, this process is called stream output. Of course, you can also directly output from the vertex shader without specifying the geometry shader, which gives the ability to directly export data from the middle of the pipeline.

insert image description here
Sometimes we process the vertices in the vertex buffer, save them, and then use them repeatedly to reduce repeated calculations.

insert image description here
insert image description here

But there is a problem here, the geometry shader seems to be very flexible and can do all kinds of things. However, once you use it, you will find that the performance is extremely low. Because of its flexibility, the hardware cannot make various assumptions to optimize performance. Can only be achieved very conservatively .

Especially in the case of cutting a triangle into multiples, it is originally an operation with a fixed algorithm, but if it is fully programmable. The hardware doesn't even know that you want to subdivide the triangle before execution, and there is no way to optimize it. As the triangle is subdivided, this requirement gradually increases.

2-tessellation

After the vertex shader, the pipeline of the gpu has added a special tessellation function. It is not one unit but three,

  • First of all, there is a programmable hull shader, in which you can specify how each primitive needs to be subdivided? For example, how many internal parts are divided into, how many segments each side is divided into, and then there is a tessellator with a fixed pipeline. use a fixed algorithm to perform subdivision,

  • Next is a domain shader, which is responsible for calculating the information of each vertex after subdivision according to the subdivision parameters. This part is also optional, if it is not enabled, it will be sent directly.

insert image description here

At this time, some people thought, since gpu has such a powerful computing power, can it not only render graphics? It can also be used as a more general parallel computing, the earliest practice is to render a large triangle covering the screen. Doing general parallel computing in the pixel shader is equivalent to each pixel being a thread.

insert image description here

Although this can solve some problems, the limitation of single input and single output still exists, and the data still needs to pass through the entire pipeline such as vertex shader, etc.
insert image description here

There is still waste, plus this approach forces developers to learn the graphics pipeline. Raised the bar.

Even that didn't stop Explorers around 2003. This developed direction is called gpgpu .

3-computer shader

Use GPUs for general computing. This demand further gave birth to gpgpu with hardware support.

Multiple inputs and multiple outputs are possible. It can be read arbitrarily, can be written arbitrarily, and no longer needs to go through those fixed pipeline units, and use the computing unit on the gpu to perform deformation calculations.
This kind of shader is called computer shader , which exists independently of the graphics pipeline, and the input and output are all memory, limited, and smaller than the graphics pipeline.

There is only one step in the entire computing pipeline, which makes the development difficulty and program composition closer to the traditional, and the threshold is much lower.

insert image description here

So far, the pipeline of GPU is very close to the current one, which can meet various requirements of real-time rendering and computing.

At the same time, the pipeline of the GPU has also become very complicated. Can it be more complicated? Yes, part of Vertex shader, hull shader, domain shader, geometry shader before Rasterizer. The meaning of their existence is to change and disassemble the geometric data, and finally send them to the Rasterizer.

But none of them can break away from the input data. To render more complex objects, more complex data must be input.

To solve this problem, it is necessary to let the GPU generate large amounts of complex data by itself with little or no input data. Any reading and writing of computer shader can do this, but it cannot be connected to Rasterizer. This requirement gave birth to amplification shader and mesh shader.

insert image description here

The amplification shader is responsible for specifying how many times the mesh shader is executed, and the mesh shader is responsible for generating geometry. At this time, the rendered unit is no longer a primitive, but a small grid called mashlate .

insert image description here

When a mashlate is sent to the amplification shader, he can decide whether the mashlate needs further processing, and if so, send it down to the mesh shader.

Produces a stack of primitives with rich detail. Although these two shalders can replace the original ones, there are not many supported GPUs and programs that use them, in the current gqu. They can coexist with the original pipeline.

Of course, the development of demand has not stopped. Over the years, games have used various methods to improve the realistic experience.

These methods often conflict with each other, or use a lot of hacks with great limitations, on the other hand. Ray tracing, an ancient but general-purpose technology, has not been well applied on GPUs because it is not only computationally intensive.

4- Ray Tracing

It also has a completely different process from the rasterization-based rendering method. For a long time, researchers have been trying to achieve more efficient ray tracing with existing GPUs. Such demand has finally been substantially developed with the ability of GPU to provide ray tracing, and at this time an independent pipeline has appeared. Contains various new types of shalders.

insert image description here

insert image description here

  • ray generation shader: generate light
  • intersection shader: determine whether the light intersects with the object
  • any hit shader: When the light hits the object, determine whether to continue to move forward,
  • closest hit shader: Calculate the color at the nearest point where the light hits the object,
  • miss shader: Responsible for calculating the color when the light does not hit any object,
  • And the cllable shader used with them can be called dynamically.

5-GPU logic module composition

There will be a later issue explaining the details of hardware ray tracing. The same idea can also be used in more fields. For example, the GPU adds a tensor computing module dedicated to computing neural networks, video encoding, and decoding modules, etc., all of which are independent pipelines.

insert image description here

Here we can see the second big difference between cpu and gpu. The purpose of cpu is a general-purpose module. When programming, just write down all the way.

The GPU is divided into multiple modules, each with its own characteristics and uses. When programming, developers need to have a clear understanding of these modules. Arrange how to use them in the program.

insert image description here

The current GPU pipelines cannot call each other, if you want to use the computing pipeline in the graphics pipeline. You have to call the calculation first, write the result into texture or buffer, and then read it in the graphics pipeline.

More than ten years ago, I proposed the idea that configuration is an assembly line, not only programmable units, but also programmable connections. This allows the assembly line to be assembled as needed

insert image description here

And now, there is no such GPU, but now the various units of the graphics pipeline. There is also the ability to output arbitrary, which partially solves this problem.

insert image description here

For a long time, people who develop on the GPU have been divided into two camps, one is a graphics pipeline that basically only uses the GPU. Typical applications are graphics for games.

Another type of computing pipeline that basically only uses GPUs is typically machine learning applications. Because the latter has been very popular in recent years, covering everything from PCs to mobile phones to servers, so that some companies claim to be GPU chips. What we do is the so-called gpgpu chip, which is a general-purpose GPU that only calculates the pipeline.

insert image description here

As I said in the last issue, the g of a GPU is graphics, and it can only be called a GPU if it has a graphics pipeline.

Only general-purpose computing on the GPU can be called gpgqu.

insert image description here
(Pay attention to this dependency and the sequence)

With only computing power but no graphics capability, calling gqi is just a scam. Isn’t it contradictory for a graphics processor without graphics capability?

insert image description here

What's more, even in the case of pure calculation. Some fixed pipeline units can also be well utilized as a supplement to general calculations, for example, the rasterizer can be used as an efficient differencer. Spread some data linearly.

The alpha blending in the output merger can also be used as an efficient data accumulator.

These can further improve the performance of general computing.

So far, at the logical level, we have seen what a GPU is, and what should the current GPU contain ? However, if divided according to this module, there will be two huge problems in directly implementing GPU hardware.

  • The first shader has so many types, if a program only uses part of it, the load will be unbalanced. Isn't other computing power wasted?

  • The second pipeline is already so complicated, how to arrange it on the hardware within a limited cost.

insert image description here

Fortunately, both problems can be solved in the same way. In this issue, we have completed the logical modules of the GPU and established it based on the current needs. The composition of the GPU should be.

In the next issue, we will take a closer look at the composition of the GPU at the hardware level.

Content from: bilibili-Gong Da's Grocery Store

Guess you like

Origin blog.csdn.net/weixin_45264425/article/details/130475511