Shader optimization scheme, shader variant

Unity's Shader is definitely a black box, which has troubled many developers, and I have been troubled for a long time. First of all, we need to know what happened to Unity's Shader from writing to finally being executed by GPU?

When writing the Shader, we use the HLSL language. When packaging, the corresponding shader language will be generated according to the target platform. For example, the Android platform is GLSL, and the iOS platform is MSL. Whether it is the whole package, or building an Assetbundle, GLSL or MSL will be packaged into the package. This can also explain that when using an unpacking tool such as AssetStudio to unpack the Shader, the original HLSL code cannot be seen. If there are many variants when packaging, the generation of GLSL or MSL will be slow, and the packaging time will increase.

The size of GLSL and MSL determines the memory of shaderlab in Profiler. Usually, in order to reduce memory, you will start by reducing shader macros. When a Shader needs to be rendered, the Opengle or metal driver will compile it at runtime, that is, Shader.Parse and Shader.CreateGpuProgram that you can see in the Profiler, so that the GPU can actually execute up. If there are too many shaders, it will definitely get stuck.

The difference between GPU and CPU is that CPU is basically compiled into .so corresponding to X86 and ARM in advance, and it is no longer compiled at runtime (except for the JIT method). And the GPU? Essentially, as long as the binary code that the GPU can understand can be provided, it can be compiled in advance. However, there are many types of GPUs. Compiling in advance will cause unnecessary waste, and the game process depends on the running conditions. Not all shaders need to be executed. Now we use Everything is compiled at runtime.

Pay attention to the two places I marked in red above, 1. Slow packaging, 2. Running card, and then expand around these two points below.

1. Packing is slow

Why is packaging slow? Isn't it just to turn HLSL text files into GLSL or MSL text files? These are the simplest files to write. What's so slow? really! If there are few variants, it is not slow at all, but if there are many variants, it will not be too many to a certain extent. As shown in the figure below, the shader that comes with URP already contains 3072 variants if the useless variants are stripped.

 Doesn't it generate 3072 shaders? Packing will not be the card owner. If you make a mistake, come again.JPG We unchecked Skip unused shader_features

There are 6.29 million variants. To generate 6.29 million shader files, this is just a shader. (You can try to generate 6 million text files to see if your computer is stuck.) It’s hard not to pack it slowly.

#pragma shader_feature _ _TEST_ON //Compile when used
#pragma multi_compile _ _TEST_ON //Always compile

Usually when writing Shader, macros that need to be switched at runtime are usually written as multi_compile, and shader_feature can be selected before packaging. This is true and must be strictly implemented.

As shown in the figure below, in order to speed up packaging, Unity will open multiple threads to compile GLSL and MSL at the same time. Since unity2018, Shader variant user stripping is provided, which is the description at the bottom of the figure below.

Packaging occurs in the Editor in edit mode, and this environment is actually quite complicated. It is very likely that some shaders have marked #pragma shader_feature, but I don’t know why it is used in the edit mode (it is not used in the official mode), but it is ruthlessly packed into the package, resulting in the increase and packaging of the shader lab memory in the profiler. Time grows, so it is best to implement variant stripping yourself in the project.

As shown in the following code, the _TEST_ON macro is forced to be stripped in packaging or Assetbundle. The code is relatively simple and will not be explained.

 

In addition, there is a pitfall in slow packaging. As shown in the figure below, you must not put the shader in the built-in shader settings, no matter #pragma shader_feature or #pragma multi_compile will be forcibly packaged. Even the above method cannot perform shader stripping, ruthlessly increasing the package size and shaderlab memory. The point is that if this shader is entered into Assetbundle again, another copy will be typed, which is a complete waste.

The engine has a built-in stripping method. For example, the game does not use real-time light, or fog can be stripped, but this is still a black box after all. I suggest that it is best to write the shader stripping by yourself.

  2. Run the card

The most typical running card is that the Shader is repeatedly packaged. For example, the shader used by the special effect is used by each special effect, but because the special effects are packaged separately, each Assetbundle contains the same shader. Every time you load special effects, you need to compile and cause the card to run.

Be sure to package the Shader as a dependency. All Shaders are placed in an Assetbundle, and all other Assetbundles depend on it. Now a new pit has been derived. Since #pragma shader_feature is used in the shader, the macros in the dependent package are all stripped. At this time, many people will use #pragma multi_compile, which will cause the above problem of slow memory packaging.

Finally, let me talk about my thoughts in the project

The project prohibits the use of #pragma multi_compile, must use #pragma shader_feature, and wrote a tool to batch extract the variant combinations of materials involved in packaging. Extract the macros that will be used on the material, because the shader will be replaced by art students, resulting in the residue of macros. You can refer to my article, Unity3D Research Institute’s Delete Material Residual Macros and Properties after Shader Replacement (125)  After extracting it, generate the ShaderVariants.shadervariants file of the variant collector. Finally, the variant collector is packaged with all shaders so that it will not be stripped.

Static analysis of the above methods is enough, but some are dynamic, such as: opening or closing a macro at runtime. This cannot be analyzed statically. I wrote a tool before to combine variants of all static macros.

For example: C is dynamically opened and closed

A B

A B C

A C is dynamically generated according to the combination of A and B. Finally, it is found that if there are many macros that need to be dynamically started and they may have mutually exclusive inclusion conditions of and and or, the automatically generated variant collector is still very large. In the end I chose branch prediction.
In this article, I conducted a detailed test on branch prediction  ARM Mobile Studio performance optimization (3)

Some effects of disturbance, dissolution, and edge light flicker in the game may be combined with any macro, and the code needs to be dynamically enabled or disabled. The edge light, the Finnier effect, must not only be dynamically enabled, but also dynamically set the edge when needed. Intensity, there is no good way for this effect, and their combination cannot be determined offline, and the overall branch prediction can be considered.

There is also an effect that is activated in a special situation, and it can be determined whether it needs to be activated during editing, so that this macro can be directly activated offline, so that local branch prediction can be performed. Because in most cases the macro _TEST_ON will not be activated.

1

2

3

4

5

6

7

8

9

#if defined(_TEST_ON)

if(_Test)

{

//do something

}

#endif

Because the macro can be analyzed offline and has been started, the dynamic switch can directly set a variable in it to determine whether the effect is on or off

material.SetFloat(“_Test”, 1f);

Reprinted from Mr. Yusong: The Shader optimization scheme I currently use for Unity3D | Yusong MOMO Program Research Institute Unity's Shader is definitely a black box, which has troubled many developers, and I have been troubled for a long time. First of all, we need to know what happened to Unity's Shader from writing to finally being executed by GPU? When writing the Shader, we use the HLSL language. When packaging, the corresponding shader language will be generated according to the target platform. For example, the Android platform is GLSL, and the iOS platform is MSL. Whether it is the whole package, or building an Assetbundle, GLSL or MSL will be packaged into the package. This can also explain when unpacking tools such as AssetStudio are used to unpack https://www.xuanyusong.com/archives/4902

Guess you like

Origin blog.csdn.net/Ling_SevoL_Y/article/details/130135186