foreword

We have gone through the previous issues, what functional modules does the current GPU have. And how to deploy them to the hardware in a controllable way, but the hardware alone is not enough, the software must be usable. In this issue, let's take a look at how the program controls the GPU to do things.

The program needs to read and write through the hardware port provided by the operating system, and directly operate the graphics hardware.
insert image description here
If each program needs to be written once for each hardware of each operating system. The development efficiency is very low,

They naturally use abstract ideas to form a public interface layer through abstraction. The program uses this interface to tell the bottom what to do, and the bottom is responsible for how to do it. This interface becomes the application programming interface api.

insert image description here

The program only needs to be written once for the graphics API, and there is almost no need to consider the difference between the operating system and hardware. The graphics API is translated down to the corresponding software by the hardware manufacturer.
insert image description here

Over time, people have discovered that different implementations of the same API also have a large number of common parts. So the implementation of the API is layered again, and an additional abstraction layer is called the device driver interface DDI.

insert image description here

DDI belongs to the operating system and is responsible for data validity checking, memory allocation, etc.

DDI is driven down, which is responsible for the special parts of each hardware.

It is equivalent to the operating system translating the API into a DDI driver, and the driver translating DDI into operations on the hardware.
insert image description here

This is the architecture of the graphics API software stack.

Architecture of Graphics API Software Stack

1-Direct3Dd->D3D

Of course, the above is only an ideal situation, and it will often be adjusted in reality, especially since the operating system itself is divided into user mode and internal mode, which makes the combination more complicated. Below I will take several representative APIs as examples to see their real-world architectures. The first is Microsoft's Direct3Dd, D3D. Only the official implementation on Windows is discussed here.
insert image description here

This API is not cross-platform, but cross-vendor.
insert image description here

In the era of Windows XP, the software station is not the same as the ideal situation. The operating system provides a D3D runtime, the upper part is the API, and the lower part is a kernel mode driver provided by the DDI
manufacturer. This framework is called xdm.

insert image description here

With the increasing demand for stability, performance, and shared resources, in the Vista era, runtime and vendor drivers are further divided into two parts: user mode and kernel mode. Each of these two parts has its own DDI.
insert image description here
When the program calls the D3D api, the D3D runtime will perform some data verification. The user-mode driver UMD provided by the manufacturer is reached through the user-mode DDI.

insert image description here

UMD compiles the shader bytecode into manufacturer-specific instructions, converts the command queue, etc., and passes it to the kernel state. The runtime part of the kernel is called dxg kernel. Do video memory allocation, device interrupt management, etc.

Call the kernel-mode driver KMD provided by the manufacturer through the kernel-mode DDI,
insert image description here

Do some manufacturer-specific operations such as address translation, and finally pass it to the GPU for execution.

This architecture is called WDDM.
insert image description here

Dividing the driver into user mode and kernel mode, and moving most of the code to user mode can greatly improve stability.

Because of the existence of the two operating system components, D3D runtime and dxg kernel, the process of developing drivers for manufacturers has directly changed from a composition question to a fill-in-the-blank question. The workload is greatly reduced, which also leads to less differentiation between different manufacturers and an overall improvement in quality.

As a result, after vista, the blue screen caused by the driver is far less than in the era of xp. Over the years, D3D has developed many versions, and the most commonly used one is 9 11 12 at present.

insert image description here

2-OpenGL

The API and user DDI are very different between each version, there is not much compatibility in the code. Every time a version is released, the program and UMD have to be greatly revised or even rewritten before it can be used.

A second example is a cross-platform, cross-vendor graphics API. opengl, which is released by Khronos.
insert image description here

Khronos only organizes standards consultation meetings,
insert image description here

What the API supports depends on the software and hardware vendors in the organization.
insert image description here

On Windows, OpenGL is in direct competition with Microsoft's own D3D, so that Microsoft has been trying to pinch OpenGL on Windows. Many attempts were dropped due to user backlash.

Windows doesn't do much for OpendGL, it just provides a framework called Installable User Driver, ICD.
insert image description here

Let hardware manufacturers implement UMD for OpenGL runtime. When you get to the kernel state, you have to go through the dxg kernel and the same KMD.

When it comes to Linux, OpenGL has two implementation methods, one is to implement the entire OpenGL completely by the manufacturer, and the other is based on the Mesa framework. Mesa provides an open source OpenGL runtime, and calls the vendor driver through DDI to operate the GPU.
insert image description here
Later, support for APIs such as OpenGL ES and Vulkan was further expanded.

OpenGL is backward compatible from 1.0 in the early 1990s to the last version 4.6.

The code then works now, and OpenGL itself is the same regardless of the platform. The part that deals with the window is slightly different, and the platform adaptation of the program is not difficult.

insert image description here

OpenGL ES even abstracts the window system and becomes EGL. Further simplifying cross-platform.

insert image description here
Note that although OpenGL and OpenGL ES are very similar,

But whenever they are mentioned, it is almost always only their different parts. Therefore, they should be treated as two different APIs.

Another representative API is NVIDIA's CUDA, which is cross-platform, but not cross-vendor. Officially, it can only run on NVIDIA GPUs. This is an API that only calculates. It can interact with other graphics APIs if necessary, and the calculated results are handed over to the graphics pipeline.
insert image description here
Of course, more often CUDA is used for pure computing, such as finite element simulation, neural network training, etc. Some functionality provided by CUDA does not exist in the graphics API and is not a higher-level abstraction.

For example, CUDA proposed the concept of shared memory from the beginning. Used well, it can significantly improve the efficiency of the GPU.
insert image description here
At that time, there was no graphics API, only through CUDA. The later compute shader was also designed under the influence of CUDA.

Comparing the current API horizontally, there is such a rule.
insert image description here
CUDA and Metal, from software to hardware, are APIs owned by one vendor. After the hardware has a new function, it can be exposed directly through the API. There is no need to discuss with other manufacturers, and the response is very fast.

The three API modes of OpenGL, OpenGL ES, and Vulkan. Both Khronos own the interface and the hardware vendors own the implementation. Used its own upward approach in the design.
insert image description here

All of them provide extension mechanism to extend the API without updating the API version.
insert image description here

Looking at APIs vertically over time, we can see that their development trend is thinning. Give more things to the program instead of runtime and drivers.

insert image description here
Because the program knows its own intentions, there is no need for the API to guess. The result of this improvement is more efficient execution. The D3D12 and Vulkan that have appeared in recent years have responded to this trend.

Such APIs appear to be lower-level, and using them to develop is more like writing a driver, requiring a lot of detailed operations. Fortunately, generally speaking, there is an abstraction layer of the rendering engine above the API. Different APIs can be abstracted into the same interface, which smooths out the disadvantages of troublesome use of new APIs, and at the same time obtains the efficiency advantages brought by new APIs.
insert image description here

As can be seen from the layered architecture mentioned above, the GPU performs the operations sent by the driver, and does not know which API it comes from, and which API the so-called GPU supports. In fact, it refers to which API driver the GPU manufacturer provides. So, what APIs and functions the GPU supports depends on the driver.

This has happened before: NVIDIA GeForce 6800 hardware doesn't support 32-bit floating point blending, but its OpenGL driver says it does. When the program uses this function, the driver switches to software mode to simulate floating-point mixing. There's nothing wrong with this, it's still a valid implementation, just with a severe loss of efficiency.

On the other hand, drivers and operating systems are highly related. Even for the same API, the drivers are completely different on different operating systems.
insert image description here

If you change the operating system, you have to rewrite the driver once.
insert image description here

For example, Qualcomm's Adreno GPU supports OpenGL ES on Android and D3D on Windows because they provide different drivers on different platforms. It can't be because OpenGL ES is supported on Android. At this point, many self-righteous people have overturned the car.

insert image description here
I saw the architecture of the software stack and some conventional implementations before, and then I will look at some different approaches. In fact, in the entire architecture, what each layer does is to translate down.

The upper and lower layers are separated by interfaces, and the upper layer does not need to know how the lower layer does it. So the API does not always have to go down to DDI, for example ANGLE is the most commonly used OpenGL ES implementation on Windows.
insert image description here
It is just a user-mode library that translates OpenGL ES into D3D11, OpenGL, Vulkan, etc.

In this way, without changing the operating system and drivers,

insert image description here
Provides support for OpenGL ES. The same kind is MoltenVK, which translates Vulkan into Metal.

insert image description here
Solve the problem that Apple's platform does not support Vulkan. There is even more wonderful D3D11on12. Instead of adding a translation layer on top, it implements a D3D11 UMD with D3D12.

insert image description here

The program still calls the original D3D11 runtime, but after reaching the UMD, it returns to the D3D12 API and then goes down.

Another unconventional category is software emulation of the GPU. Such a driver does not connect to the GPU hardware, but does everything on the CPU.

insert image description here

Mesa has a built-in software simulation driver.
insert image description here

D3D also has one, called WARP.

insert image description here
Be careful not to confuse it with the warp of the GPU thread mentioned in the previous issue. It is a completely different thing. It is

just a deliberate reference to the name of the warp speed, and it also implies that the speed is very fast. Such a CPU simulates a GPU, which can assist in software and hardware development and debugging.

For example, if you invent a new function, before the hardware is produced, you can simulate it to define the details of this function: as a reference for hardware design. On a server without a GPU installed, such software simulation can also be temporarily used to run some GPU programs.

insert image description here

In the same way, the driver can also send hardware operations to another machine for remote execution. This opens the way for virtual GPUs and cloud GPUs.

Then, after the program calls the graphics API, walks through the entire stack, and lets the GPU render, does it directly write to the frame buffer?

Not always. In the same system, multiple programs are running at the same time, who writes where in the frame buffer? How to ensure that there is no conflict?

insert image description here
There is another thing that is often overlooked here, called a synthesizer, compositor.

There are different components on different systems to fill this role

insert image description here
DWM on Windows and SurfaceFlinger on Android.

insert image description here

Each program renders its own content, not to the framebuffer, but to a texture. This texture is submitted to the operating system. After the compositor gets it,
insert image description here
it calls the graphics API again, synthesizes them into the frame buffer, and sends them for display.

This is called windowed mode, where each program has its own window and all programs coexist on the desktop.

Some games will enable full-screen exclusive mode, and the performance is a little higher,
insert image description here
because the compositor is bypassed and the frame buffer is directly reached.

insert image description here
But because it is monopolized, other windows cannot be displayed, and even the input method may not be displayed normally. Compositor is transparent to programs in each operating system,

Ordinary programs don't need to know it exists at all. So that in the graphics software stack, compositor is rarely mentioned. But in the system, the meaning of compositor is visible to the naked eye.

The frosted glass windows on Windows and the window animation effects on macOS are also rendered by compositor. So far, the hardware and software stack of the GPU has been completed, and we have seen the entire path from top to bottom. The next issue will look at some details in the GPU graphics pipeline. Especially the rasterization operation.

The content comes from: bilibili-Gong Da's Grocery Store

Gong Da's video study notes: God's perspective on GPU (4) - complete software stack

foreword

Architecture of Graphics API Software Stack

1-Direct3Dd->D3D

2-OpenGL

Guess you like