Metal2 new features: raster order group (ROG, Raster Order Groups)

Background theme of the article: [Metal analysis engine (2): The traditional deferred rendering and TBDR]

1. Raster Order Groups role

ROG is doing it? The official explanation: accurate control of the parallel sequence fragment shader threads access the same pixel.

Popular point that is, when we render the scene objects, some overlap before and after occlusion body fragment shader objects may simultaneously access the same data that pixel coordinates, resulting in competition, lead to erroneous results. The ROG is used to synchronize access to the order of the pixels to prevent competition.

Such explanation may not be enough intuitive, official gives an example of this point of view.

Consider the following scenario, the lens in the scene have two overlapping triangles when plotted developer code for such transparent objects are drawn in the order from the back, that is, a blue triangle draw call to call back before calling the front of the green triangle draw call, Metal will be in the order of our code to execute draw call instruction, this view seems these two draw call is followed by serial execution, but in fact not the case, computing on the GPU process is highly parallel, although the instruction issued by the CPU is drawn first blue triangle, but does not guarantee the GPU Metal blue green triangle fragment shader will perform than the first triangle, Metal only guarantee the time of mixing is to blend according draw executing the call sequence, as shown below:

So the question is, blend although not overlap to ensure that the serial, but before the blend does not guarantee serial read and write operations, while the blue triangle fragment shader result of the blend written to a pixel, may green triangle is reading the fragment shader take the color of the pixel, resulting in competition, as shown below:

ROG is to solve this conflict to read and write data above.

2. Raster Order Groups resolve conflicts literacy

ROG way to resolve the conflict is read thread synchronization, i.e., synchronization with a pixel sampling point or (if per-sample colored pattern) corresponding to the thread of the thread. When the realization, as long as the developers ROG attribute attribute flag data memory, so multiple threads access the same pixel data of the current thread will wait for the end of the re-writing of data access. The following figure shows the ROG synchronize two threads so that the thread 2 wait after the end of the thread 1 resumes the write data read data:

Raster Order Groups will only be used to synchronize threads to resolve the conflict you read and write? Not only that, Raster Order Groups in the Metal2 A11 expanded as new features for achieving more powerful, more versatile.

3. Metal2 A11 new features: Multiple Raster Order Groups

Metal2 A11 began Raster Order Groups has been extended, in addition to single-channel synchronous imageblock and threadgroup memory data can be achieved, also began to support multiple definitions ROG use, developers can synchronize more fine-grained control threads, further reducing the waiting thread time.

Typical examples of Multiple Raster Order Groups optimize the rendering is mentioned in another article TBDR: Single Pass deferred rendering .

Paper said conventional dual-Pass Rendering delay, the first G-buffer Pass Rendering saved to system memory, and then read the second Pass G-buffer system memory delay in lighting calculations. There is then tile memory the A11 is achieved Tile based shading, so that the G-buffer is divided into tile-sized size thereby continuing stored in GPU imageblock memory, proceeding directly delay lighting calculations are completed deferred rendering in a Pass, reducing data bandwidth.

Raster Order Groups, this is how our Single Pass deferred rendering to optimize the performance of it?

We know that the main address efficiency issues deferred rendering scene rendering of multiple light sources, when the process of conducting a general GPU multi-threaded, multi-source lighting delay is calculated as follows: the

The second case when the source wants to read G-buffer for the current pixel lighting calculations, must wait for the first light source calculation to begin reading and writing end G-buffer (and lighting calculations on G-buffer together).

Now we can optimize this problem by defining multiple Raster Order Groups. The developer can map resources and lighting calculations in G-buffer are separated into different Raster Order Groups can, for example, the calculation result into the first light Lighting group, the G-buffer of albedo, normal, depth and so release to the second set, so that these can be two separate sets A11, the second light source is then ready to read G-buffer data of the second group are lighting calculations, only when the calculation result of the first writing light lighting group synchronization can wait. After the optimization process is as follows:

The official TBDR deferred rendering in a single realization Demo Pass has been achieved in the use of Multiple Raster Order Groups for performance optimization:
Published 109 original articles · won praise 403 · views 880 000 +

Guess you like

Origin blog.csdn.net/cordova/article/details/103031716