Unity Performance Optimization: Combined Batches

foreword

This series is a little knowledge of performance optimization, and it is some points in daily game development and performance. This article is the second article in this series of articles. The previous article is linked:

Part 1: Unity Performance Optimization: Resources


In the early days Unity, the processing methods for batches were mainly the following three:

  • Static Batching
  • Dynamic Batching
  • GPU Instancing

And there are strict restrictions on their use. After the Unitylaunch SPR, in order to improve the scope and efficiency of batching, a new batching method is provided SPR Batcher. This article will briefly introduce these batching technologies.

Draw Call、Batcher、 Sat pass Call

Before starting to understand batching, you need to understand some reference values ​​for measuring the rendering rate of CPU processing

  • Draw Call

In the early stage of the Unity engine, the resource consumption of the CPU during rendering is mostly measured by the number of Draw Calls
because the processing stage of the CPU in the rendering pipeline is the application stage, mainly to do some data preparation and submission work, while the Draw Call's processing stage is the application stage. The number represents the number of times the CPU submits the data to the GPU. The Draw Call itself is just some bytes of the data stream. The main performance consumption lies in the data preparation phase of the CPU.

  • Batcher

Due to the emergence of batching, not every rendering object will generate a Draw Call, so this time a new measure is proposed: Batcher

  • Set Pass Call

As mentioned earlier, CPUin the rendering stage, the peak of performance consumption is generally not in Draw Callthe stage of data preparation, so it is not accurate to use the number of data submissions as the measurement standard. If the two materials before and after change, it will consume more performance, which is also the CPUmost performance-consuming step in the whole rendering stage, so it Unityis Set Pass Callused as the standard for performance consumption.

The main batch technology introduction

The following is a brief description Unityof the batching methods commonly used in China, and these descriptions are mainly from Unityofficial documents, some of which are copied directly, and the information is relatively accurate:

1、Static Batching

According to Unitythe official documentation, Static Batchingthe working principle is as follows:

  • Convert static game objects to world space and build a shared vertex and index buffer for them.
  • If enabled ,Optimized Mesh Data removes any vertex elements not used by any shader variant when building the vertex buffer. UnityIn order to do this, the system does some special keyword checking; for example, if Unitythe LIGHTMAP_ONkeyword , the lightmaps are removed from the batchUV
  • A series of simple draw calls are executed against visible game objects in the same batch, with Unitylittle state change between each call. Technically, the drawUnity calls are not reduced , but the state changes between them (which is the part that consumes a lot of resources). APIOn most platforms, batching is limited to 64k vertices and 64k indices ( OpenGLES48k indices macOSon , 32k indices on )

Simply put, Static Batchingby merging some small meshes back into memory, when performing rendering operations, CPUsend the merged memory GPUto reduce Draw Callthe amount at one time, but there are certain restrictions:

  • Objects must be static and not moveable
  • Merged objects use the same material

At the same time Static Batching, additional memory is required to store the combined geometry when used, resulting in a certain degree of waste of memory. To put it simply, as a time-efficient operation can be obtained through the replacement of memory, it is necessary to carefully add rendering objects according to the actual situation to avoid CPUunnecessary memory problems when obtaining performance advantages.

As for Static Batchingthe use, you first need to tick Project Settingthe options in :PlayerStatic Batching
insert image description here

Next, you can check the required objects in the Inspectorpanel, and the specific positions are as shown in the following figure:Static BatchingBatching Static

insert image description here

2、Dynamic Batching

Dynamic BatchingIt is also possible to merge objects with a common material, but the objects can be dynamic, and this process is dynamic, just need to check in Project Settingthe box , but note that in the template, This option has been moved to the configuration file, the specific location is shown in the figure:PlayerDynamic BatchingURPURP
insert image description here

Although Dynamic Batchingthe setup steps are very simple, its use conditions are very harsh, and a series of qualifications need to be met to achieve the effect of batching. The Unityofficial also made a detailed list in the document:

  • Batching dynamic game objects has a per-vertex overhead, so batching will only be applied to meshes that contain no more than 900 vertex attributes in total and no more than 300 vertices. Up to 300 vertices can be batched if the shader uses vertex positions, normals, and a single UV, while only 180 vertices can be batched if the shader uses vertex positions, normals, UV0, UV1, and tangents.

  • If GameObjects have mirror images in their transforms, those objects will not be batched (eg, GameObject A with a +1 scale and GameObject B with a –1 scale cannot be batched together). Even if the GameObjects are basically the same, using different Material Instances will result in the GameObjects not being batched together. The exception is shadow caster rendering.

  • GameObjects with lightmaps have additional renderer parameters: lightmap index and lightmap offset/scale. In general, GameObjects for dynamic lightmaps should point to the exact same lightmap location to be batched.
    Multi-pass shaders interrupt batching.

  • Almost all Unity shaders support multiple lights in forward rendering, effectively performing an extra pass for them. Draw calls for Additional Per-Pixel Lights are not batched.
    The legacy deferred (lighting pre-pass) render path disables dynamic batching because it has to draw the game object twice

It seems a lot, but the simple summary is probably that the model should be simple, and the one used Shadermust be single Pass. At the same time, because of the single Passlimit, for deferred rendering, the lighted objects have no way to perform dynamic batching operations because the lighting is separated into separate Passprocessing, so it will be directly blocked.Dynamic Batching

3、GPU Instanceing

Use GPU Instanceingto draw (or render) multiple copies of the same mesh at once with a small number of draw calls. It's useful for drawing objects that recur in a scene, such as buildings, trees, and grass:

  • GPU InstanceingOnly the same mesh is rendered on each draw call, but each instance can have different parameters (for example, color or scale) to increase variation and reduce repetition in appearance.

  • GPU InstanceingYou can reduce the number of draw calls used per scene. Can significantly improve the rendering performance of your project.

Similar to other batching methods, GPU Instanceingthere are also some usage restrictions:

  • UnityAutomatically pick which mesh renderer component to instantiate and Graphics.DrawMeshcall . Note that it is not supportedSkinnedMeshRenderer

  • UnityOnly GPUbatch game objects that share the same mesh and the same material in a single instanced draw call. Using a small number of meshes and materials can improve instancing efficiency. To create variants, modify the shader script to add data for each instance

The official link for that description is: GPU Instancing

The above are some descriptions of the official documents GPU Instanceing. It can be seen that, unlike the other two batching methods, in addition to the same material, it is mainly valid for objects using the same mesh, so as the name Instanceingsuggests, it is GPUdirectly A technical means of instantiating an object to reduce CPUthe performance consumption of data command preparation for scene objects

4、SRP Batcher

SRP BatcherLink to the official documentation: SRP Batcher , it doesn't matter if you don't want to go to the official documentation, I also directly moved it here and added some explanatory text

Enable SRP Batch:
To use SRP Batcher, the project must use the programmable rendering pipeline. Programmable rendering pipelines can be:

  • Universal Render Pipeline ( URP)
  • HD rendering pipeline ( HDRP)
  • customizeSRP

Since the latter two methods are not commonly used, this article will URPbe introduced based on templates, and for URPthe specific details, you can check this article: Unity upgrade project to Urp (Universal Rendering Pipeline) and screen post-processing

When we use the URP template in the project, we can find the URPconfiguration file of the current project in the resource directory, where we can see SRP Batcherthe control options:
insert image description here

At the same time, when the project is under the URPtemplate, the switch control options are also migrated to the configuration file, but compared to the default rendering pipeline, this technology is turned off by default, because it has no advantages Dynamic Batchingrelative toSRP Batcher

SRP Batcher principle:

Unity, the properties of any material can be modified at any time within a frame. However, this approach has some drawbacks. For example, DrawCallwhen using a new material, there is a lot of work to do. Therefore, the more materials in the scene, the more Unitymust be used to set up theGPU data . CPUThe traditional way to solve this problem is to reduce DrawCallthe number of to optimize the CPUrendering cost, Unitybecause DrawCalla lot of setup has to be done before the is emitted. The actual CPUcost comes from that setting, not from GPU DrawCallitself ( justDrawCall the few bytes that need to be pushed to the command buffer)UnityGPU

As described in the description, the performance consumption of the Set Pass Callgame in the rendering stage is mainly related to some tasks in the material switching stage, and the preparation time of the new material is exchanged for the persistent storage of the data buffer in the data buffer , thereby reducing the data preparation pressure.CPUSPR BatcherGPUCPUCPU

SRP BatcherReduce the settings between by batching a series of Bindand commandsDraw GPU , the specific process is shown in the figure:DrawCallGPU
insert image description here

For maximum rendering performance, these batches must be as large as possible. To achieve this, you can use as many different materials with the same shader as possible, but you must use as few shader variants as possible

In the inner render loop, Unitywhen a new material is detected, CPUall properties are collected and different constant buffers are set in GPUmemory . GPUThe number of buffers depends on how the shader declares itsCBUFFER

To speed things up in the general case where a scene uses many different materials but few shader variants, SRPnatively integrates paradigms (e.g. GPUdata persistence)

SRP Batcheris a low-level rendering loop that keeps material data persistent in GPUmemory . If the material content doesn't change, SRP Batcheryou don't need to set up the buffer and upload the buffer to it GPU. In fact, SRP Batchera dedicated code path is used to quickly GPUupdate Unityengine properties in large buffers, like this:
insert image description here

This is the SRP Batcherrendering workflow. Use a dedicated code path to quickly update engine properties in SRP Batcherlarge GPUbuffers . UnityHere, only the engine properties markedCPU as in the image above are processed . All materials are persistent in memory and can be used at any time. This speeds up rendering because: All material content is now persisted in memory . Dedicated code manages a large per-object property for all per-object propertiesPer Object large bufferUnityGPUCBUFFERGPUGPU CBUFFER

SRP Batcher Restrictions:

To SRP Batcherenable codepaths to render objects:

  • The rendered object must be a mesh or skinned mesh. The object cannot be a particle.

  • Shaders must be SRP Batchercompatible with . All lit andHDRP unlit shaders in and meet this requirement (except the "particle" versions of these shaders). To make shaders compatible with SRP Batcher:URP

  • All built-in engine properties must beUnityPerDraw declared in a named . CBUFFERfor example unity_ObjectToWorldorunity_SHAr

  • All material properties must be declared in UnityPerMaterialaCBUFFER

For batch performance testing

Traditional batch:

Generally speaking, batching is to reduce the data processing of scene rendering, thereby reducing the CPUpressure during rendering. Through Unitythe performance analysis tool , Profileryou can easily see the values ​​related to it:

insert image description here

Click Renderingto Open Frame Debuggersee the knowledge related to batching in the panel, specific to the number of parameters such as participation Static Batching, Dynamic Batching, and GPU Instancingthree batching technologiesDraw Call

Of course, we can also CPUanalyze the performance consumption to obtain CPUthe bottleneck information of the segment:
insert image description here

By clicking CPU Usage, you can see in the panel below BatchRendener.Flush, this is a very noteworthy CPUparameter that can affect rendering performance, we can Selfevaluate its current impact on the time-consuming CPU:

When expanded, you can see up to four sub-options:

  • Render.Mesh: Correspondingly CPUprocessed objects that cannot be batched
  • Batch.DrawInstanced: CPUFor the GPU Instancingprocessing object corresponding to the processing
  • Batch.DrawStatic: The corresponding object to CPUbe processedStatic Batching
  • Batch.DrawDynamic: The corresponding object to CPUbe processedDynamic Batching

In the scenario of the above screenshot analysis, we placed 30000one Cubeand performed different batch operations on them respectively to analyze the resource consumption in the whole batch process, Draw Calland make statistics on it and time. Due Dynamic Batchingto the harsh usage scenarios and for CPUThe performance is not obvious. This batch method is eliminated here. The specific batch method of objects in the scene is:

  • 10000 objects:Static Batching
  • 10000 objects:GPU Instancing
  • 10000 objects:Dynamic Batching
  • Two extra objects: not handled

Through the monitoring Profilerof CPUperformance performance, it is found that the operating efficiency of the three batching methods is the highest in static batching, GPU Instancingwhich Dynamic Batchingis relatively poor. It is worth noting that when there are many objects in the scene, the values ​​observed and obtained through the above analysis methods are displayed GPU Instancing. The time-consuming is more than Dynamic Batchingthat, but in fact the total time-consuming of the two batching technologies in CPUthe entire rendering stage is reversed. We can see the results clearly by switching Profilethe mode to:HierarchyTimeLine

insert image description here

As can be seen from the above figure, although Dynamic Batchingthe time-consuming generated by itself (the sum of the short paragraphs below) is relatively small, it will cause the corresponding BatchRendener.Flush(the above paragraph) time-consuming to increase, so we are analyzing their advantages in use. , you can switch to TimeLinemode to analyze the overall time-consuming situation

The BatchRendener.Flushspecific content can be Unityunderstood through the description of a technical person in the official forum. Here is the person's original words:
insert image description here

SRP Batcher:

When we turn it on in the project SRP Batcher, we will find that other batching methods no longer work, just like Static Batchingblocking GPU Instancing, but the difference is that there is no specific document description for this piece, just a simple assumption. However, you can simply do an experiment to confirm this statement. When it is not turned SRP Batcheron, use dynamic batching for several specific objects, and then you can Profilersee the successful realization of static batching in it:
insert image description here
Then SRP Batcherafter turning on the switch:
insert image description here

So I can only simply understand here that SRP Batcherother batching methods will be blocked, and if you want to observe SRP Batcherthe performance consumption, you can find it directly through TimeLineit SRP Batcher.Flush, as shown in the figure:
insert image description here

Summarize

Regarding Unitythe batching method in China, there are several mature and effective methods above. They have their own advantages and disadvantages. You need to choose the appropriate method according to the actual application scenario. In short, if your memory budget is very limited, then don't Consider static batching to avoid increasing memory pressure. Don't Forget the Price You Pay When Receiving These Technological Benefits

Guess you like

Origin blog.csdn.net/xinzhilinger/article/details/121121772