Unity performance optimization of batching and elimination

Batches have a relatively large impact on rendering performance. Too many batches will lead to too many submissions by the cpu, resulting in too long rendering time per frame, so we need to optimize it to reduce the number of Bathches and the number of SetPassCalls.
There are many methods for batch merging, which are listed below:

Manual batching

Merge the Mesh of the same material into a new Mesh, which is the most convenient to adjust in one rendering, although this method is not used very much now. It’s a bit cumbersome, it will increase the memory and package size, and it will increase the size occupied by LightMap, and recreate the LOD, and if the volume is too large, the changes of LightProbe and ReflectionProbe will be single. Related plugins: MeshBaker and tutorials

Static batching Static batching:

principle:

During the non-running period, automatically calculate the merged vertices of the Mesh and convert them to the world space coordinate system, build a shared vertex and index buffer, and upload the data to the gpu at runtime. The editor is compiled when you click Run, and the package is packaged in the scene when you build. For the scene, each static object has no actual merging operation, and is still a separate individual. At runtime, we will find that it uses a merged Mesh. At runtime, each object is still culled, sorted, and rendered according to the normal process. When rendering a single object, each renderer component calls Drawcall that only contains the offset and range of the triangle index, so it is very fast. Static batching will not reduce Drawcall, but reduce the number of SetPassCall and data submission. The simple understanding is that static batching only changes the content of the submission, making multiple submissions into one, and these submissions are only changed to render data from the content of that submission.

condition:

The material and shader parameters must be completely consistent to be batched. LightMap, LightProbe, ReflectionProbe, and multiple light sources need to be consistent. It is recommended to increase the size of LightMap to 2k, and try to ban others, or increase the scope of influence, and share one.
Batching is in world space, if object space coordinates are used, an error will occur.

advantage:

The merge operation is automatically processed, there is no pollution, it will not change any data in the scene, objects can be culled individually, LOD

shortcoming:

Increased memory and package usage, especially for heavily duplicated objects, trees, and grass. Each batch contains a maximum of 65,000 vertices, beyond which additional merges will be performed.

GPU Instancing

principle

GPU Instancing is sending a mesh to the GPU and rendering it using a set of transformation matrices and MaterialPropertyBlocks.
When performing Instance batching, all required information (transformation matrices, material property blocks) is collected to create an array indexed by instance ID. The matrix is ​​the stored world space coordinate system data. If there is a change in the batched objects in each frame, then each frame needs to pay performance to regenerate the data. The material attribute block is used to customize some individual attributes (per-instance properties), but does not support textures and requires additional writing support. The implementation method needs to write a constant buffer in the shader:

UNITY_INSTANCING_BUFFER_START(Props)
  UNITY_DEFINE_INSTANCED_PROP(float4, _Color)
UNITY_INSTANCING_BUFFER_END(Props)

We need to write the switched properties between UNITY_INSTANCING_BUFFER_START and UNITY_INSTANCING_BUFFER_END. Then it needs to be programmatically compatible, which can be manually built through scripts when calling DrawMeshInstanced, or set on the Renderer component using MaterialPropertyBlock.

several implementations

  1. Turn on the GPU Instance option in the shader panel.
    This method is also basically a common method, and it is also an automated GPU Instancing. After checking, you can realize the batching of GPU Instancing. The important thing is that your Shader must support it, and all built-in unity supports it.
    Unity will collect information and dynamically construct the Constant Buffer (constant buffer) according to the objects currently seen by the camera. If the properties on the material are different, we can also add some additional properties to the shader. These properties need to be written in the middle of UNITY_INSTANCING_BUFFER.
  2. DrwaMeshInstanced
    needs to use code to control GPU Instancing. It can draw more objects at one time, up to 1023 objects at one time. This mainly depends on the capacity of CBuffer. The maximum capacity of Constant Buffer is only 64K. This instantiation method manages a persistent constant buffer (CBuffer) with the same script, and does not change the CBuffer frequently with the objects seen by the camera. Therefore, better cpu and gpu performance can be obtained at the cost of occupying a certain amount of cpu memory. This method will bypass unity's rendering framework, so functions such as culling operations and LOD cannot be applied to each model.
  3. DrawMeshInstancedIndirect / DrawMeshInstancedProcedural
    official document click here , this method is to provide Instance data through COmputeBuffer, ComputeBuffer is actually StructureBuffer, the capacity can be much larger than Constant Buffer. Therefore, this method can draw a huge amount of Instance. Geometric Compute Shader can implement GPU Frustum Culling (GPU visual frustum culling) and Hi-z Occlusion Culing ( Hi-z occlusion culling ). The disadvantage is that it has poor compatibility and requires ShadingModel 4.5 or above, and supports the hardware features of Compute Shader and Compute Buffer.

Realization conditions

  1. Requires the same Mesh
  2. require the same material
  3. Shader needs to support GPU Instancing

Advantage

  1. It will not take up a lot of memory due to the increase of objects.
  2. Suitable for a large number of repeated objects, such as trees, grass, small stones, etc.
  3. Support per-instance property, customize the properties of a single model

shortcoming

  1. Can only support the same Mesh
  2. LOD will interrupt the batching of GPU Instancing, but static ones will not, because when merging, it has merged the LOD-related ones into one model.

Related plugin: GPU Instancer

SRP Batcher

principle

SRP is batch processing based on Shader. It does not need the same Mesh and material. It is realized. Reducing the settings when switching each material means reducing the interaction between cpu and gpu to achieve performance optimization. The running logic is to upload the grid to the gpu at startup, and then put the material master data batched using SRP into the list for one upload and update the data in the list when each frame is updated. When rendering, the list will be offset to obtain data rendering, which mainly plays the role of merging SetPassCall.

Realization conditions

  1. It must be a Shader variant, the same Shader variant, the same macro, and the same queue, the same blending mode, depth detection write, etc.
  2. Shader needs to support SRP. For Shader written in normal URP, the variable needs to be written to UnityPerMaterial first:
    insert image description here
    if your shader supports SRP, you can view it on the shader panel
    insert image description here

Advantage

  1. Reduced data upload, no need to upload the mesh every time
  2. Greatly reduced communication between cpu and gpu
  3. Different materials are supported, as long as the shader variants are the same
  4. Support Skinning Mesh bone model

Technology Selection

  1. For static objects, it is recommended to use static batching.
  2. If a large batch of renderings of the same material and the same model are batched using GPU Instancing
  3. Skeleton models can only use SRP
  4. Everything else uses SRP
  5. If the material supports SRP, even if GPU Instancing is set, it will not be enabled. You can modify the shader so that it does not support it.

remove

Unity's built-in culling modes are:

  1. The frustum is removed where the camera cannot see it.
  2. layerCullDistances can set the culling distance for different layers
  3. Occlusion culling removes models that are occluded by other models

Nice plugins in the unity store:

  1. optimizers
  2. Scene Optimizer
  3. Culling the piece
  4. Perfect Culling - Occlusion Culling System

Optimization of the number of screens

For static models, the number of faces on the same screen mainly affects the GPU time consumption.
For skeletal models, the number of model faces will affect the time consumption of CPU skeleton animation or cloth simulation.
Without affecting the game quality, try to control the number of faces to an acceptable level. minimum range. Experienced model art students are required to check in the production process.
Related plug-ins:

  1. Poly Few | Mesh Simplifier and Auto LOD Generator
  2. Amplify Impostors
  3. Mesh Combine Studio 2

other aspects

Other things we need to pay attention to are the places that affect the interaction between cpu and gpu after all.
Lighting, multi-pass, real-time shadow, plane reflection, real-time ReflectionProbe, multi-camera

Guess you like

Origin blog.csdn.net/qq_30100043/article/details/130448521