Unity rendering performance analysis tool

Target

Since you want to optimize, you must have a goal:
General requirements on PC: rendering 60 frames per second
Mobile terminal: rendering 30 frames per second
This should be the minimum requirement, if the game frame rate changes while the game is running, the human eye can clearly see it Feeling the frame rate drop.
The first rule of thumb for optimization is to find where the performance problem is.
Generally, the problem is either with the cpu or the gpu.

profiler

Unity has a built-in performance detection tool,
insert image description here
which can be opened in Window->Analysis->Profiler. Due to the problem of debugging accuracy in the editor, it is generally recommended to pack and debug. Remember to turn on the debug
insert image description here
mode when packing and debugging, and turn on the automatic connection debugger and support deep debugging. , so that when the packaging is completed and the scene is automatically opened, unity will automatically connect to the profiler, saving us from connecting by ourselves.
insert image description here
The picture above shows the time taken by each frame after running some frames.
insert image description here
We can also switch to Hierarchy mode here to view the time-consuming of each function list. When you select a function, the icon above will also highlight the relevant content.
insert image description here
In this section, you can see the time occupied by each part, the type on the left, and the change in the occupied time on the right. Click to select a frame.
insert image description here
In the following, I will see that it is mainly divided into three parts. The total time of this frame is 16.74ms. The
left side is the time occupied by the logic part of the code. My test project does not have much logic occupation, so the performance occupation is very small. The middle part is the time taken to call rendering. You can see that it takes 2.15ms, and then the final synchronization takes 13.61ms. It is equivalent to that the computer can always run at full frame rate.
If there is a problem with the script, you need to ask the program to troubleshoot. TA generally needs to check the two parts on the right.
insert image description here
For multi-threaded, we can also see that the rendering thread is separated for us, let us check the operations of the rendering thread. The gray part is the time spent waiting for the main thread, Gfx.WaitForGfxCommandsFromMainThread will be displayed on it, and the blue part is the time spent on batch submission during actual rendering. In the urp rendering pipeline, post-processing operations are done on the main thread.
insert image description here
As you can see from the picture above, I should have turned on vertical synchronization, which caused the synchronization time to be too long. If you are troubleshooting, it is recommended to turn off vertical synchronization. Let it use the maximum performance operation.
insert image description here
The picture above shows the view after turning off the vertical synchronization, which is the real rendering time of each frame. But when you look at the screen, there will be a sense of tearing, which is also the benefit of vertical synchronization. When debugging, we can turn it off.

Render every frame

On the cpu, things that need to be processed per frame:

  1. Logic related: scripting, physics, animation
  2. Rendering: culling, sorting, drawing
    • DrawCall contains a single player's data and related rendering information texture matrix, etc., and then submits the rendering command
    • SetPassCall is used to set all the rendering state data used to render the mesh material,
    • Batches is a buffer packet that contains a shared vertex and index. It does not need to submit vertex data, and the speed is very fast. The significance of batch processing is to reduce the switching of rendering states. It cannot reduce DrawCall, but it can reduce the switching of other states. Relatively In other words, calling DrawCall takes less time than SetPassCall.
  3. Synchronization: Synchronization problems generally have problems with vertical synchronization and frame rate limitations, and there will be a waiting state during synchronization.
    • Vertical synchronization means that when your rendering frame rate is higher than the maximum frame rate of the screen display, it will automatically limit the frame rate and keep the display in sync.
    • The frame rate limit is also to ensure a balanced frame rate per second without tearing.
    • If WaitForTargetFPS appears in the synchronization, it is because of the problem of vertical synchronization, and it is not recommended to enable it during debugging.
    • If GfxDeviceD3D11.WaitForLastPresent appears, it indicates that all CPU threads have completed their tasks and are waiting for the CPU, and there may be a gpu performance bottleneck.
    • If Gfx.WaitForPresentOnGfxThread appears, it means that the main thread has completed non-rendering tasks and is waiting for the rendering thread, but the rendering thread has not yet completed. 1. If the rendering thread is performing Camera.Render at this time, and Camera.Render takes too long, it indicates that the performance bottleneck is in the rendering part of the CPU. 2. If the rendering thread is running Gfx.PresentFrame at this time, it indicates that the performance bottleneck is on the GPU side.

On the GPU, what affects the rendering efficiency of the GPU is the pixel filling rate (filling rate), filling rate = screen pixel Shader complexity Overdraw, the main contents that can affect the efficiency are:

  1. Screen Resolution
  2. Post-processing effects
  3. Shader complexity
  4. Overdraw Repeated drawing means that the same pixel of the screen is drawn multiple times, usually because
  5. Bandwidth bottleneck: Memory bandwidth is the rate at which the GPU can read and write to memory. When the number of GPUs currently rendering is too large, the memory cannot be transferred to the GPU in time, which will cause time-consuming waiting. In the common situation, Gbuffer and various buffers and RT are resident in deferred rendering, which occupy a very large memory and are always being read and written. The bandwidth performance and texture processing capability of mobile platform gpu are relatively low, so you need to pay attention to this problem, which is why deferred rendering is rarely used on mobile terminals.
  6. The number of triangles on the same screen, the number of vertices, why the influence of vertices is small, can be calculated in this way, a 1k picture is 1 million pixels, and we rarely use models with 1 million vertices.
    To sum up: to distinguish where the problem lies depends on the synchronization functions, while on the gpu, it mainly depends on the calculation amount of pixels, the screen resolution is the main reason, the amount of screen pixel rendering increases exponentially, and the post-processing is also based on the screen resolution Calculated, and the translucency is because all the areas where the model is located need to be drawn, because its rendering order is drawn from far to near based on the position of the camera, and all positions that are not blocked by translucency need to be drawn. We can feel that the amount of calculation on the gpu is very large, after all, each pixel needs to run the contents of the fragment shader once.

performance analysis tool

  1. Unity's built-in Profiler mentioned above
  2. FrameDebugger frame debugger, the main rendering effect debugging, view the current rendering content of each frame
  3. FPS Counter scene component, which can be directly added to the scene to view the rendering status
  4. The performance tool UWA officially provided by UPR unity is a third-party professional performance company
  5. RenderDoc frame capture tool XCode is a debugging tool used by the ios platform, generally used for debugging Apple mobile phones

scene optimization

  1. The scene structure and hierarchical recommendation should not be too complicated, and the dynamically generated ones should be placed directly under the Root.
  2. Try to use Profab instead of GameObject directly.
  3. A common set of Shaders ensures that the objects use the same Shader, which is the premise of batching.
  4. LightMap recommends 2048, too many will affect the batch.
  5. Check ReflectionProbe, it also affects batching
  6. For static objects, try to ensure that the shader balls are shared, and the pictures are merged
  7. Use GPU Instancing for lots of trees, grass, stones
  8. Check whether the remaining objects can be batched by srp.
  9. Check whether the final resources are occupied too much
  10. Determine whether to use LOD according to the number of screens
  11. Optimize scene Shader and lighting and shadow settings.

Guess you like

Origin blog.csdn.net/qq_30100043/article/details/130440015