Unity performance optimization five: rendering module pressure

CPU pressure

Batching

Before GPU rendering, the CPU will send data to the GPU in batches. Each time it is sent, it is a drawcall. When the GPU renders each batch, it will switch the rendering state. The rendering state here refers to: the effect of the object on the screen . The rendering properties or materials of the appearance on the screen, such as: shader, paste, color, rendering mode (transparent, translucent), etc.

Batch combination method in unity:

Priority:
SRP Batcher / Static Batching
GPU Instancing
Dynamic Batching

Conditions for using Draw Call Batching
1. Supports Mesh Renderers, Trail Renderers, Line Renderers, Particle Systems
and Sprite Renderers, and can only batch process Renderers of the same type. Skin renderers are not supported.

2. You need to use the same material, so use Renderer.sharedMaterial instead of
Render.material in the script. The latter generates a copy of the material, which will interrupt batching.
3. Using MaterialPropertyBlock will also interrupt batching, but it It is still faster than using multiple materials.
4. The rendering of transparent objects is strictly performed in order, and batching can easily be interrupted.
5. Try not to use negative scaling values.
 

StaticBatching:

The purpose of static batching is not to reduce drawcalls, but to reduce changes in rendering status, because before rendering, various rendering attributes of the object need to be set. If it is the same batch, just set it once.

The reason why static batching does not reduce drawcalls is because static batching objects can be clipped. It only merges the vertex arrays, but the vertex index is still separate, so that you can decide which submesh to draw based on the index value, such as 10 The static batch of grids becomes one grid, but the fifth grid is not within the view frustum, so there are 2 drawcalls, the 1st-4th submesh, the 6th-10th submesh, and the 5th one is clipped. , although there are 2 drawcalls, the rendering state is only set once

If there is no static batching, although the material balls and textures of these 10 meshes are the same, they will be drawn in 10 drawcalls, that is, 10 batches, and 10 rendering states must be set.


Additional details:
1. When statically batching in the editor, Unity will not use any runtime CPU resources to generate mesh data.
2. Static batching at runtime will cause a higher CPU peak, which may cause One freeze
3. After the static batch is completed, the object becomes a whole and is static, and the Transform attribute cannot be modified. 4. The Transform attribute
of the staticBatchRoot root object after the batch can be modified at runtime.
5. However, the batch is merged at runtime. The object needs to turn on the Read/Write option
 

Manual Mesh Merge

Manually merging meshes is similar to static batching, but it cannot crop submesh. If there is only a single submesh in the field of view, it will also draw the entire mesh

Dynamic Batching

It comes in two types in Unity, one for meshes and one for dynamically generated geometry like particle systems

The purpose of dynamic batching is to reduce CPU time consumption, but batching itself consumes CPU, so the conditions for batching in it are relatively strict

GPU Instance

Principle:
Unity stores all relevant information such as position, scaling, uv offset, lightmappindex, etc. into the Constant Buffer at one time for all objects that meet the requirements. When an object enters the rendering process as an instance, it will be based on the incoming Instance ID to retrieve the corresponding information from the video memory for the subsequent rendering stage, without sending data to the GPU every time, so as to achieve optimized efficiency

Usage method
1. Check the Enable Instancing option in the Inspector panel of the material
2. Use Graphics.DrawMeshInstanced or Graphics.DrawMeshInstancedirdirect to manually call the GPU instance

MaterialPropertyBlock

Using MaterialPropertyBlock to set a random color will not interrupt the batching. If you use material.setcolor directly, it will interrupt the batching, because it is a separate material ball, which is most suitable for GPU Instance and least suitable for SRP Batcher.

Disadvantages: The priority is relatively low, and it takes a little longer to submit a drawcall than usual

advantage:

  • Compared with static batching, it will not bring additional memory pressure
  • Compared with dynamic batching, there are no strict vertex restrictions.
  • It is very suitable for MaterialPropertyBlock and will not interrupt batching

Applicable scene:
It is necessary to draw a large number of scenes with the same Mesh, such as grass sea, woods and the like

SRP Batcher

For materials that use the same shader variant, that is, the shaders are batched together. Even if the shader balls are inconsistent, as long as the shaders are consistent, it is OK. When the project is switched to the SRP pipeline, information is transmitted through the UniformBuffer. After the SRP Batcher is turned on, it will be pre- Generate Uniform Buffer and transfer information in batches. SRP Batcher combines batches in Shader units, which can effectively reduce the number of SetPassCall (setting rendering status) for CPU performance optimization.
 

principle:

The rendering process for SPR Batcher not turned on is: the attributes of each object will have a CBuffer on the GPU, which includes Gameobject attributes, such as transform, material attributes, such as materials, light maps, etc. When the properties are updated, the data must be reset. Every time a material is added, the corresponding cbuffer will be set again, which consumes CPU.

After it is turned on, the process changes. The same shader will generate a large buffer for different attributes, such as transform. For the same attributes, such as light maps, each will generate a small permaterial. When there is The buffer will be modified only when the state changes. If only the transform is modified, the data at the same position will only be written through the offset.

For the newly added material, its shader has not changed, so the batch will not change.

Traditionally, people tend to reduce the number of Draw Calls to optimize the CPU. The Draw Call itself is just a few bytes pushed into the GPU command buffer. The real CPU cost comes from the many settings before the Draw Call. The SRP Batcher does not reduce the number of Draw Calls. , it just reduces the setup cost between Draw Calls

Rendering pipeline requirements:
supports URP, HDRP, SRP, built-in pipeline is not supported
Game object requirements:
must contain a Mesh or Skinned Mesh, not particles
Cannot use MaterialPropertyBlock
Shader must be compatible SRP Batcher
 

Advantages:
Saves UniformBuffer write operations, supports dynamic objects, and supports a wider range than static batching. At the same time, the memory cost will be much smaller, and it is also applicable to situations with many materials.

Applicable scenarios:
Shader repetition rate is high, but the number of Shader variants needs to be controlled
 

Comparison of four methods

Priority:
SRP Batcher / Static Batching > GPU Instancing > Dynamic Batching
Applicable situations:
Static Batching+SRP Batcher: Main city, copy building
SRP BatcherOnly: A wide variety of vegetation
GPU Instancing: A single type of vegetation
Dynamic Batching: Ul, particles, Sprite wait

Culling

Before the GPU renders, the CPU needs to transfer rendering data to the GPU, so some objects that do not need to be rendered need to be eliminated first, which is Culling. The Unity engine natively supports view frustum culling, which eliminates objects outside the view volume. The data of these objects does not need to be passed to the GPU for processing.

In Unity, all visual content inherits from Renderer, such as MeshRenderfer, SpriteRenderer, LineRenderer, SkinnedMesh Renderer, TrailRenderer, etc. Unity filters them during the rendering process and automatically performs frustum culling operations.

If there are a large number of activated cameras in the scene, the total time spent on Culling will also increase accordingly. Even if it is not used to display objects, the culling time will still be executed. The function is reflected in ->Camera.Renderer in the Render thread. 

CullingGroup

CullingGroup is an API interface provided by Unity. It is the same system as Unity's own Cu system and LOD. It is equivalent to opening up some underlying functions of Cull for users to use. Unity - Manual: CullingGroup
API

Occlusion

Basic introduction
The camera performs culling operations in each frame. These operations check the renderers in the scene and exclude (
cull) those that do not need to be drawn
. By default, the camera performs frustum culling.

How it works:
Data about the scene is generated in the Unity Editor and then used at runtime to determine what the camera can see . The process of generating the data is called baking.
When baking occlusion culling data, Unity divides the scene into multiple units and generate data describing the geometry within the unit and the visibility between adjacent units. Unity then merges units as much as possible to reduce the size of the generated data. At runtime, Unity will load these baked data into In-memory, and for each camera with the Occlusion Culling property enabled, a query will be performed on the data to determine what that camera can see

The CullQueryPortalVisibilitylJmbra function will appear under the sub-thread of CullSendEvents,
and this function will also appear in the worker thread during the test

Recommendations

Obstruction:

  • Large occlusion objects have good occlusion quality, such as mountains
  • It is not suitable to combine large blocking objects because the blocking cannot be accumulated, such as a forest.
  • Don’t have too many gaps, like cheese
  • When modeling, be careful to avoid unintentional gaps.
  • Try not to allow the camera to enter the interior of the obstruction, which can be achieved through collision

Obscured object:

  • Most of them can be set as occluded objects for easy removal
  • A very large object is not suitable as an occluded object, because it will always be seen, such as terrain, you can consider dividing it into multiple parts




 

Guess you like

Origin blog.csdn.net/qq_37672438/article/details/132003379