foreword
This series is a little knowledge of performance optimization, and it is some points in daily game development and performance. This article is the second article in this series of articles. The previous article is linked:
Part 1: Unity Performance Optimization: Resources
In the early days Unity
, the processing methods for batches were mainly the following three:
Static Batching
Dynamic Batching
GPU Instancing
And there are strict restrictions on their use. After the Unity
launch SPR
, in order to improve the scope and efficiency of batching, a new batching method is provided SPR Batcher
. This article will briefly introduce these batching technologies.
Draw Call、Batcher、 Sat pass Call
Before starting to understand batching, you need to understand some reference values for measuring the rendering rate of CPU processing
Draw Call
In the early stage of the Unity engine, the resource consumption of the CPU during rendering is mostly measured by the number of Draw Calls
because the processing stage of the CPU in the rendering pipeline is the application stage, mainly to do some data preparation and submission work, while the Draw Call's processing stage is the application stage. The number represents the number of times the CPU submits the data to the GPU. The Draw Call itself is just some bytes of the data stream. The main performance consumption lies in the data preparation phase of the CPU.
Batcher
Due to the emergence of batching, not every rendering object will generate a Draw Call, so this time a new measure is proposed: Batcher
Set Pass Call
As mentioned earlier, CPU
in the rendering stage, the peak of performance consumption is generally not in Draw Call
the stage of data preparation, so it is not accurate to use the number of data submissions as the measurement standard. If the two materials before and after change, it will consume more performance, which is also the CPU
most performance-consuming step in the whole rendering stage, so it Unity
is Set Pass Call
used as the standard for performance consumption.
The main batch technology introduction
The following is a brief description Unity
of the batching methods commonly used in China, and these descriptions are mainly from Unity
official documents, some of which are copied directly, and the information is relatively accurate:
1、Static Batching
According to Unity
the official documentation, Static Batching
the working principle is as follows:
- Convert static game objects to world space and build a shared vertex and index buffer for them.
- If enabled ,
Optimized Mesh Data
removes any vertex elements not used by any shader variant when building the vertex buffer.Unity
In order to do this, the system does some special keyword checking; for example, ifUnity
theLIGHTMAP_ON
keyword , the lightmaps are removed from the batchUV
- A series of simple draw calls are executed against visible game objects in the same batch, with
Unity
little state change between each call. Technically, the drawUnity
calls are not reduced , but the state changes between them (which is the part that consumes a lot of resources).API
On most platforms, batching is limited to 64k vertices and 64k indices (OpenGLES
48k indicesmacOS
on , 32k indices on )
Simply put, Static Batching
by merging some small meshes back into memory, when performing rendering operations, CPU
send the merged memory GPU
to reduce Draw Call
the amount at one time, but there are certain restrictions:
- Objects must be static and not moveable
- Merged objects use the same material
At the same time Static Batching
, additional memory is required to store the combined geometry when used, resulting in a certain degree of waste of memory. To put it simply, as a time-efficient operation can be obtained through the replacement of memory, it is necessary to carefully add rendering objects according to the actual situation to avoid CPU
unnecessary memory problems when obtaining performance advantages.
As for Static Batching
the use, you first need to tick Project Setting
the options in :Player
Static Batching
Next, you can check the required objects in the Inspector
panel, and the specific positions are as shown in the following figure:Static Batching
Batching Static
2、Dynamic Batching
Dynamic Batching
It is also possible to merge objects with a common material, but the objects can be dynamic, and this process is dynamic, just need to check in Project Setting
the box , but note that in the template, This option has been moved to the configuration file, the specific location is shown in the figure:Player
Dynamic Batching
URP
URP
Although Dynamic Batching
the setup steps are very simple, its use conditions are very harsh, and a series of qualifications need to be met to achieve the effect of batching. The Unity
official also made a detailed list in the document:
-
Batching dynamic game objects has a per-vertex overhead, so batching will only be applied to meshes that contain no more than 900 vertex attributes in total and no more than 300 vertices. Up to 300 vertices can be batched if the shader uses vertex positions, normals, and a single UV, while only 180 vertices can be batched if the shader uses vertex positions, normals, UV0, UV1, and tangents.
-
If GameObjects have mirror images in their transforms, those objects will not be batched (eg, GameObject A with a +1 scale and GameObject B with a –1 scale cannot be batched together). Even if the GameObjects are basically the same, using different Material Instances will result in the GameObjects not being batched together. The exception is shadow caster rendering.
-
GameObjects with lightmaps have additional renderer parameters: lightmap index and lightmap offset/scale. In general, GameObjects for dynamic lightmaps should point to the exact same lightmap location to be batched.
Multi-pass shaders interrupt batching. -
Almost all Unity shaders support multiple lights in forward rendering, effectively performing an extra pass for them. Draw calls for Additional Per-Pixel Lights are not batched.
The legacy deferred (lighting pre-pass) render path disables dynamic batching because it has to draw the game object twice
It seems a lot, but the simple summary is probably that the model should be simple, and the one used Shader
must be single Pass
. At the same time, because of the single Pass
limit, for deferred rendering, the lighted objects have no way to perform dynamic batching operations because the lighting is separated into separate Pass
processing, so it will be directly blocked.Dynamic Batching
3、GPU Instanceing
Use GPU Instanceing
to draw (or render) multiple copies of the same mesh at once with a small number of draw calls. It's useful for drawing objects that recur in a scene, such as buildings, trees, and grass:
-
GPU Instanceing
Only the same mesh is rendered on each draw call, but each instance can have different parameters (for example, color or scale) to increase variation and reduce repetition in appearance. -
GPU Instanceing
You can reduce the number of draw calls used per scene. Can significantly improve the rendering performance of your project.
Similar to other batching methods, GPU Instanceing
there are also some usage restrictions:
-
Unity
Automatically pick which mesh renderer component to instantiate andGraphics.DrawMesh
call . Note that it is not supportedSkinnedMeshRenderer
-
Unity
OnlyGPU
batch game objects that share the same mesh and the same material in a single instanced draw call. Using a small number of meshes and materials can improve instancing efficiency. To create variants, modify the shader script to add data for each instance
The official link for that description is: GPU Instancing
The above are some descriptions of the official documents GPU Instanceing
. It can be seen that, unlike the other two batching methods, in addition to the same material, it is mainly valid for objects using the same mesh, so as the name Instanceing
suggests, it is GPU
directly A technical means of instantiating an object to reduce CPU
the performance consumption of data command preparation for scene objects
4、SRP Batcher
SRP Batcher
Link to the official documentation: SRP Batcher , it doesn't matter if you don't want to go to the official documentation, I also directly moved it here and added some explanatory text
Enable SRP Batch:
To use SRP Batcher
, the project must use the programmable rendering pipeline. Programmable rendering pipelines can be:
- Universal Render Pipeline (
URP
) - HD rendering pipeline (
HDRP
) - customize
SRP
Since the latter two methods are not commonly used, this article will URP
be introduced based on templates, and for URP
the specific details, you can check this article: Unity upgrade project to Urp (Universal Rendering Pipeline) and screen post-processing
When we use the URP template in the project, we can find the URP
configuration file of the current project in the resource directory, where we can see SRP Batcher
the control options:
At the same time, when the project is under the URP
template, the switch control options are also migrated to the configuration file, but compared to the default rendering pipeline, this technology is turned off by default, because it has no advantages Dynamic Batching
relative toSRP Batcher
SRP Batcher principle:
Unity
, the properties of any material can be modified at any time within a frame. However, this approach has some drawbacks. For example, DrawCall
when using a new material, there is a lot of work to do. Therefore, the more materials in the scene, the more Unity
must be used to set up theGPU
data . CPU
The traditional way to solve this problem is to reduce DrawCall
the number of to optimize the CPU
rendering cost, Unity
because DrawCall
a lot of setup has to be done before the is emitted. The actual CPU
cost comes from that setting, not from GPU DrawCall
itself ( justDrawCall
the few bytes that need to be pushed to the command buffer)Unity
GPU
As described in the description, the performance consumption of the Set Pass Call
game in the rendering stage is mainly related to some tasks in the material switching stage, and the preparation time of the new material is exchanged for the persistent storage of the data buffer in the data buffer , thereby reducing the data preparation pressure.CPU
SPR Batcher
GPU
CPU
CPU
SRP Batcher
Reduce the settings between by batching a series of Bind
and commandsDraw GPU
, the specific process is shown in the figure:DrawCall
GPU
For maximum rendering performance, these batches must be as large as possible. To achieve this, you can use as many different materials with the same shader as possible, but you must use as few shader variants as possible
In the inner render loop, Unity
when a new material is detected, CPU
all properties are collected and different constant buffers are set in GPU
memory . GPU
The number of buffers depends on how the shader declares itsCBUFFER
To speed things up in the general case where a scene uses many different materials but few shader variants, SRP
natively integrates paradigms (e.g. GPU
data persistence)
SRP Batcher
is a low-level rendering loop that keeps material data persistent in GPU
memory . If the material content doesn't change, SRP Batcher
you don't need to set up the buffer and upload the buffer to it GPU
. In fact, SRP Batcher
a dedicated code path is used to quickly GPU
update Unity
engine properties in large buffers, like this:
This is the SRP Batcher
rendering workflow. Use a dedicated code path to quickly update engine properties in SRP Batcher
large GPU
buffers . Unity
Here, only the engine properties markedCPU
as in the image above are processed . All materials are persistent in memory and can be used at any time. This speeds up rendering because: All material content is now persisted in memory . Dedicated code manages a large per-object property for all per-object propertiesPer Object large buffer
Unity
GPU
CBUFFER
GPU
GPU CBUFFER
SRP Batcher Restrictions:
To SRP Batcher
enable codepaths to render objects:
-
The rendered object must be a mesh or skinned mesh. The object cannot be a particle.
-
Shaders must be
SRP Batcher
compatible with . All lit andHDRP
unlit shaders in and meet this requirement (except the "particle" versions of these shaders). To make shaders compatible with SRP Batcher:URP
-
All built-in engine properties must be
UnityPerDraw
declared in a named .CBUFFER
for exampleunity_ObjectToWorld
orunity_SHAr
-
All material properties must be declared in
UnityPerMaterial
aCBUFFER
For batch performance testing
Traditional batch:
Generally speaking, batching is to reduce the data processing of scene rendering, thereby reducing the CPU
pressure during rendering. Through Unity
the performance analysis tool , Profiler
you can easily see the values related to it:
Click Rendering
to Open Frame Debugger
see the knowledge related to batching in the panel, specific to the number of parameters such as participation Static Batching
, Dynamic Batching
, and GPU Instancing
three batching technologiesDraw Call
Of course, we can also CPU
analyze the performance consumption to obtain CPU
the bottleneck information of the segment:
By clicking CPU Usage
, you can see in the panel below BatchRendener.Flush
, this is a very noteworthy CPU
parameter that can affect rendering performance, we can Self
evaluate its current impact on the time-consuming CPU
:
When expanded, you can see up to four sub-options:
Render.Mesh
: CorrespondinglyCPU
processed objects that cannot be batchedBatch.DrawInstanced
:CPU
For theGPU Instancing
processing object corresponding to the processingBatch.DrawStatic
: The corresponding object toCPU
be processedStatic Batching
Batch.DrawDynamic
: The corresponding object toCPU
be processedDynamic Batching
In the scenario of the above screenshot analysis, we placed 30000
one Cube
and performed different batch operations on them respectively to analyze the resource consumption in the whole batch process, Draw Call
and make statistics on it and time. Due Dynamic Batching
to the harsh usage scenarios and for CPU
The performance is not obvious. This batch method is eliminated here. The specific batch method of objects in the scene is:
- 10000 objects:
Static Batching
- 10000 objects:
GPU Instancing
- 10000 objects:
Dynamic Batching
- Two extra objects: not handled
Through the monitoring Profiler
of CPU
performance performance, it is found that the operating efficiency of the three batching methods is the highest in static batching, GPU Instancing
which Dynamic Batching
is relatively poor. It is worth noting that when there are many objects in the scene, the values observed and obtained through the above analysis methods are displayed GPU Instancing
. The time-consuming is more than Dynamic Batching
that, but in fact the total time-consuming of the two batching technologies in CPU
the entire rendering stage is reversed. We can see the results clearly by switching Profile
the mode to:Hierarchy
TimeLine
As can be seen from the above figure, although Dynamic Batching
the time-consuming generated by itself (the sum of the short paragraphs below) is relatively small, it will cause the corresponding BatchRendener.Flush
(the above paragraph) time-consuming to increase, so we are analyzing their advantages in use. , you can switch to TimeLine
mode to analyze the overall time-consuming situation
The BatchRendener.Flush
specific content can be Unity
understood through the description of a technical person in the official forum. Here is the person's original words:
SRP Batcher:
When we turn it on in the project SRP Batcher
, we will find that other batching methods no longer work, just like Static Batching
blocking GPU Instancing
, but the difference is that there is no specific document description for this piece, just a simple assumption. However, you can simply do an experiment to confirm this statement. When it is not turned SRP Batcher
on, use dynamic batching for several specific objects, and then you can Profiler
see the successful realization of static batching in it:
Then SRP Batcher
after turning on the switch:
So I can only simply understand here that SRP Batcher
other batching methods will be blocked, and if you want to observe SRP Batcher
the performance consumption, you can find it directly through TimeLine
it SRP Batcher.Flush
, as shown in the figure:
Summarize
Regarding Unity
the batching method in China, there are several mature and effective methods above. They have their own advantages and disadvantages. You need to choose the appropriate method according to the actual application scenario. In short, if your memory budget is very limited, then don't Consider static batching to avoid increasing memory pressure. Don't Forget the Price You Pay When Receiving These Technological Benefits