Optimizing Unity UI(三):Unity UI Profiling Tools

Version check: 2017.3 -Difficulty: Advanced

There are several analysis tools that can be used to analyze the performance of UnityUI. The main tools are:

  • Unity Profiler
  • Unity Frame Debugger
  • Xcode’s Instruments or Intel VTune
  • Xcode’s Frame Debugger or Intel GPA

External tools provide method-level CPU analysis with millisecond (or better) resolution, as well as detailed draw calls and shader analysis. Instructions for setting up and using the above tools are beyond the scope of this guide. Please note that the Xcode framework debugger and tools can only be used for IL2CPP builds on Apple platforms, so currently they can only be used to configure IOS builds.

Unity Profiler

The main purpose of UnityProfiler is to perform comparative analysis: enabling and disabling UI elements while UnityProfiler is running can quickly narrow down the part of the UI hierarchy that is most responsible for performance issues.

To analyze this, observe the lines of Canvas.BuildBatch and Canvas.endWillRenderCanvases in the analyzer output.

As mentioned earlier, Canvas.BuildBatch is a native code calculation that performs the canvas batch generation process.

Canvas.SendWillRenderCanvases contains a C# script that calls the WillRenderCanvases event of the subscribed Canvas component. The UnityUIsCanvasUpdateRegistry class receives this event and uses it to run the reconstruction process, as described earlier. It is expected that any dirty UI components will update their "canvas renderers" at this time.

Note: To make it easier to see the difference in UI performance, it is usually recommended to disable all tracking categories except "rendering", "scripting" and "UI". This can be done by clicking the colored box next to the tracking category name on the left side of the CPU Usage Analyzer. You can also reorder the categories in the CPU Analyzer by clicking and dragging the category name up or down.

The UI category is new in Unity 2017.1 and later. Unfortunately, some parts of the UI update process are not properly classified, so be careful when looking at the UI curve, as it may not contain all UI-related calls. For example, Canvas.endWillRenderCanvases is classified as "UI", while Canvas.BuildBatch is classified as "Others" and "Rendering".

In 2017.1 and above, there is also a new user interface analyzer. By default, this profiler is the last one in the Profiler window. It consists of two timetables and a batch viewer:

The first timeline shows the CPU time used for two categories, namely calculation layout and presentation. Please note that it encounters the same problem described earlier, and some UI functions may not be explained.

The second timeline shows the total number of batches, vertices, and event markers. In the previous screenshot, you can see several button click events. These markers can help you determine the cause of the CPU spike.

Finally, the most useful feature of UIProfiler is the batch viewer at the bottom. On the left, there is a tree view of all your canvases, and under each canvas, a list of the batches they generate. These columns provide interesting details about each canvas or batch, but for a better understanding of how to optimize your UI, one of them is crucial, which is why the batch is interrupted.

This column will show the reason why the selected batch cannot be merged with the previous batch. Reducing the number of batches is one of the most effective ways to improve UI performance, so it is important to understand what can break batching.

One of the most common reasons, as shown in the screenshot, is the use of UI elements with different textures or materials. In many cases, this can be easily solved by using sprite atlases . The last column shows the name of the game object associated with the batch. You can double-click the name to select the game object in the editor (this is especially useful when you have several objects with the same name).

Starting from Unity 2017.3, the batch viewer only works in the editor. On the device, batch processing should usually be the same, so this is still very useful. If you suspect that the batch processing on the device may be different, then you can use the framework debugger described next.

Unity Frame Debugger

UnityFrameworkDebugger is a useful tool to reduce the number of drawing calls generated by UnityUI. This built-in tool can be accessed through the window menu in the Unity editor. When enabled, it will display all drawing calls generated by Unity, including calls generated by UnityUI.

It is important to note that the framework debugger will update itself with the generated draw call to display the "Game View" in the "Unity Editor", so it can be used to try different UI configurations without entering the "Play Mode".

The location of UnityUI draw calls depends on the rendering mode selected on the canvas component being drawn:

  • Screen Space-The  overlay will appear in the Canvas.RenderOverlay group.
  • Screen Space-The camera will appear in the Camer.Render group as a subgroup of Render.
  • World Space will appear as a subgroup of Render. Transparent geometry of the world space camera for each visible canvas

All UIs can be identified by the "Shader: UI/Default" line (assuming that the UI shader has not been replaced by a custom shader). The details of the group or lottery call. See the highlighted red box in the screenshot below.

By viewing this set of rows while adjusting the UI, it is relatively simple to maximize the ability of the canvas to combine UI elements into batches. The most common cause of design-related broken batches is unintentional overlap.

All UnityUI components generate their geometry as a series of quaternions. However, many UI sprites or UI text symbols only take up a small part of the quaternion file to represent them, and the rest is space. Therefore, it is often found that UI designers have accidentally overlapped multiple different tetragons, and their textures are from different materials, so they cannot be processed in batches.

Since UnityUI operates entirely in a transparent queue, any unstable tetragons overlaid on it must be drawn before the unbreakable quaternions, so it cannot be batch processed with other tetragons placed on the unbreakable quaternions.

Consider the case of three tetragons, A, B, and C. Assume that these three tetragons overlap each other, and also assume that tetragons A and C use the same material, and quaternion B uses a separate material. Therefore, batch A or C cannot be used for Quad B.

If the order in the hierarchy (from top to bottom) is A, B, C, then A and C cannot be batched because B must be drawn on top of A and below C. However, if B is placed before or after the batchable quaternion, the batchable quaternions can be batched-B only needs to be drawn before or after the batched quaternions, and they are not inserted .

To discuss this issue further, see the Child order section of the Canvas chapter.

Instruments & VTune

Xcode’s Instruments and Intel’s VTune allow for extremely deep profiling of Unity UI rebuilds and Canvas batch calculations on Apple or Intel CPUs, respectively. The method names are nearly identical to the profiler labels discussed above in the Unity Profiler section:

  • Canvas::SendWillRenderCanvases is the C++ parent that calls the Canvas.SendWillRenderCanvases C# method and governs that line in the Unity Profiler. It will contain the code used to run the Rebuild process, as described in the previous chapter.

  • Canvas::UpdateBatches is identical to Canvas.BuildBatch, but includes additional boilerplate code not covered by the Unity Profiler label. It runs the actual Canvas Batch Building process, described above.

When used in conjunction with a Unity app built via IL2CPP, these tools can be used to drill down deeper into the transpiled C# code of Canvas::SendWillRenderCanvases. Of primary interest will be the cost of the following methods. (Note: transpiled method names are approximate.)

  • IndexedSet_Sort and CanvasUpdateRegistry_SortLayoutList are used to sort the list of dirty Layout components before the layouts are recalculated. As described above, this involves calculating the number of parent transforms above each Layout component.
  • ClipperRegistry_Cull calls all registered implementers of the IClipRegion interface. Built-in implementers include RectMask2D, which uses the IClippable interface. During ClipperRegistry.Cull calls, RectMask2D components loop over all clippable elements contained within their hierarchy and asks them to update their culling information.
  • Graphic_Rebuild will contain the cost of actually calculating the meshes needed to represent Image, Text or other Graphic-derived components. Beneath this will be several other methods like Graphic_UpdateGeometry and, most notably, Text_OnPopulateMesh.
    • Text_OnPopulateMesh is generally a hotspot when Best Fit is enabled. This is discussed in more detail later in this guide.
    • Mesh modifiers, such as Shadow_ModifyMesh and Outline_ModifyMesh, will also run here. The cost of calculating component drop shadows, outlines and other special effects can be seen via these methods.

Xcode Frame Debugger & Intel GPA

Low-level frame debugging tools are essential for profiling the cost of individual portions of the batched UI as well as monitoring the cost of UI overdraw. UI overdraw is discussed in more detail later in this guide.

Using the Xcode Frame Debugger

To test whether a given UI is overstressing the GPU, Xcode’s built-in GPU diagnostics tools can be employed. First, configure the project in question to use Metal or OpenGLES3, then make a build and open the resulting Xcode project. Some Xcode version and device combinaisons may support OpenGLES 2 frame captures, but there’s no guarantee it will work.

Note: On some versions of Xcode, it is necessary to select the appropriate Graphics API in the Build Scheme in order to make the graphics profiler work. To do this, go to the Product menu in Xcode, expand the Scheme menu item, and choose Edit Scheme.... Select the Run target and go to the Options tab. Change the GPU Frame Capture option to match the API used by your project. Assuming the Unity project is set up to automatically select a graphics API, then most modern iPads will default to using Metal. If in doubt, start the project and look at the debug logs in Xcode. One of the early lines should indicate which rendering path (Metal, GLES3 or GLES2) is being initialized.

Build and run the project on an iOS device. The GPU profiler can be found by showing the Debug pane in Xcode’s Navigator sidebar, and clicking on the FPS entry.

The first point of interest in the GPU profiler is the set of three bars in the center of the screen, labeled “Tiler”, “Renderer”, and “Device”. Of these two:

  • “Tiler” is generally a measure of how stressed the GPU is by processing geometry, which includes time spent in vertex shaders. Generally, a high “Tiler” usage indicates either excessively slow vertex shaders or an excessive number of vertices being drawn.
  • “Renderer” is generally a measure of how stressed the GPU’s pixel pipelines are. Generally, high “Renderer” usage indicates that an application is exceeding the maximum fill-rate of the GPU, or has inefficient fragment shaders.
  • “Device” is a composite measure of overall GPU usage, which includes both “Tiler” and “Renderer” performance. It can generally be ignored, as it will roughly track the higher of the “Tiler” or “Renderer” measurements.

For more information on Xcode’s GPU profiler, see this documentation article.

Xcode’s Frame Debugger can be triggered by clicking on the small ‘Camera’ icon hidden at the bottom of the GPU profiler. It is highlighted by an arrow and a red box in the following screenshot.

After a brief pause, the Frame Debugger’s summary view should appear, like so:

When using the default UI shader, the cost of rendering geometry generated by the Unity UI system will show up under the “UI/Default” shader pass, assuming the default UI shader has not been replaced with a custom shader. It is possible to see this default UI shader in the above screenshot as Render Pipeline “UI/Default.”

Unity UI only generates quads and so the vertex shader is unlikely to stress the tiler pipeline of the GPU. Any problems that appear in this shader pass are likely due to fill-rate issues.

Analyzing profiler results

After gathering profiling data, several conclusions might be drawn. If Canvas.BuildBatch or Canvas::UpdateBatches seems to be using an excessive amount of CPU time, then the likely problem is an excessive number of Canvas Renderer components on a single Canvas. See the Splitting Canvases section of the Canvas chapter.

If an excessive amount of time is spent drawing the UI on the GPU, and the frame debugger indicates that the fragment shader pipeline is the bottleneck, then the UI is likely exceeding the pixel fill rate which the GPU is capable of. The most likely cause is excessive UI overdraw. See the Remediating fill-rate issues section of the Fill-rate, Canvases and input chapter.

If Graphic Rebuilds are using excessive CPU, as seen by a large portion of CPU time going to Canvas.SendWillRenderCanvases or Canvas::SendWillRenderCanvases, then deeper analysis is needed. Some portion of the Graphic Rebuild process is likely responsible.

In the case that a large portion of WillRenderCanvas is spent inside IndexedSet_Sort or CanvasUpdateRegistry_SortLayoutList, then time is being spent sorting the list of dirty Layout components. Consider reducing the number of Layout components on the Canvas. See Replacing layouts with RectTransforms and Splitting Canvases sections for possible remediations.

If excessive time seems to be spent in Text_OnPopulateMesh, then the culprit is simply the generation of text meshes. See the Best Fit and Disabling Canvases sections for possible remediations, and consider the advice inside Splitting Canvases if much of the text being rebuilt is not actually having its underlying string data changed.

If time is spent inside Shadow_ModifyMesh or Outline_ModifyMesh (or any other implementation of ModifyMesh), then the problem is excessive time spent calculating mesh modifiers. Consider removing these components and achieving their visual effect via static images.

If there is no particular hotspot within Canvas.SendWillRenderCanvases, or it appears to be running every frame, then the problem is likely that dynamic elements have been grouped together with static elements and are forcing the entire Canvas to rebuild too frequently. See the Splitting Canvases section.

Guess you like

Origin blog.csdn.net/Momo_Da/article/details/93532474