Optimizing Unity UI(三):Unity UI Profiling Tools

版本检查: 2017.3-难度: 高级

有几个分析工具可以用来分析UnityUI的性能。主要工具是:

  • Unity Profiler
  • Unity Frame Debugger
  • Xcode’s Instruments or Intel VTune
  • Xcode’s Frame Debugger or Intel GPA

外部工具提供具有毫秒(或更好)分辨率的方法级CPU分析以及详细的绘制调用和着色器分析。设置和使用上述工具的说明超出本指南的范围。请注意,Xcode框架调试器和工具仅可用于Apple平台的IL2CPP构建,因此当前只能用于配置IOS构建。

Unity Profiler

UnityProfiler的主要用途是执行比较分析:在UnityProfiler运行时启用和禁用UI元素可以迅速缩小UI层次结构中对性能问题最负责的部分。

要分析这一点,请观察Canvas.BuildBatch和Canvas.endWillRenderCanvases在分析器输出中的线条。

如前所述,Canvas.BuildBatch是执行画布批生成过程的本机代码计算。

Canvas.SendWillRenderCanvases包含调用订阅Canvas组件的WillRenderCanvases事件的C#脚本。UnityUIsCanvasUpdateRegistry类接收此事件并使用它运行重建过程,如前所述。预计此时任何脏UI组件都将更新其“画布渲染器”。

注意:为了更容易地看到UI性能上的差异,通常建议禁用除“呈现”、“脚本”和“UI”之外的所有跟踪类别。这可以通过单击CPU使用分析器左侧跟踪类别名称旁边的彩色框来完成。还可以通过单击和拖动类别的名称向上或向下在CPU分析器中重新排序类别。

UI类别是Unity2017.1和更高版本中的新内容。不幸的是,UI更新过程的某些部分没有正确分类,所以在查看UI曲线时要小心,因为它可能不包含所有与UI相关的调用。例如,Canvas.endWillRenderCanvases被归类为“UI”,而Canvas.BuildBatch被归类为“Others” and “Rendering”.

在2017.1及以上,也有一个新的用户界面分析器。默认情况下,这个分析器是Profiler窗口中的最后一个。它由两个时间表和一个批处理查看器组成:

第一个时间线显示用于两类的CPU时间,分别是计算布局和呈现。请注意,它遇到了前面描述的相同的问题,有些UI函数可能无法解释。

第二个时间线显示批次、顶点的总数,并显示事件标记。在前面的屏幕截图中,您可以看到几个按钮单击事件。这些标记可以帮助您确定导致CPU峰值的原因。

最后,UIProfiler最有用的特性是底部的批处理查看器。在左边,有一个你所有画布的树视图,在每个画布下面,一个他们生成的批次的列表。这些列提供了关于每个画布或批处理的有趣细节,但是对于更好地理解如何优化您的UI,其中有一个是至关重要的,这就是批处理中断的原因。

本列将显示所选批无法与前一批合并的原因。减少批处理数量是提高UI性能的最有效方法之一,因此了解什么会破坏批处理非常重要。

最常见的原因之一,如截图所示,是使用不同纹理或材料的UI元素。在许多情况下,这可以很容易地通过使用sprite atlases来解决。最后一列显示与批处理关联的游戏对象的名称。您可以双击名称以在编辑器中选择游戏对象(当您有几个同名对象时,这特别有用)。

从Unity2017.3开始,批处理查看器只在编辑器中工作。在设备上,批处理通常应该是相同的,所以这仍然是非常有用的。如果您怀疑设备上的批处理可能不同,那么您可以使用下一个描述的框架调试器。

Unity Frame Debugger

UnityFrameworkDebugger是一个减少UnityUI生成的绘图调用数量的有用工具。此内置工具可通过Unity编辑器内的窗口菜单访问.当启用时,它将显示由Unity生成的所有绘图调用,包括由UnityUI生成的调用。

特别要注意的是,框架调试器将使用生成的抽签调用更新自身,以在“Unity 编辑器”中显示“游戏视图”,因此可以用于尝试不同的UI配置,而无需进入“播放模式”。

UnityUI绘制调用的位置取决于正在绘制的画布组件上选择的呈现模式:

  • Screen Space –  覆盖将出现在Canvas.RenderOverlay组中。
  • Screen Space – 相机将出现在Camer.Render组中,作为Render的一个子组。
  • World Space 将作为Render的一个子组出现。每个可见画布的世界空间相机的透明几何

所有UI都可以通过“Shader:UI/Default”行来识别(假设UI着色器尚未被自定义着色器替换)。在小组或抽奖电话的细节。请参阅下面屏幕截图中突出显示的红色框。

通过在调整UI时查看这组行,相对来说,最大限度地利用画布将UI元素组合成批的能力是相对简单的。最常见的与设计相关的破碎批次的原因是无意中的重叠。

所有UnityUI组件都会将它们的几何图形生成为一系列四元。然而,许多UI精灵或UI文本符号只占用来表示它们的四元文件的一小部分,其余的都是空间。因此,通常会发现UI的设计人员无意中重叠了多个不同的四角体,它们的纹理来自不同的材料,因此不能批量处理。

由于UnityUI完全在透明队列中操作,任何覆盖在其上的不稳定四元的四角体必须在不可击四元之前绘制,因此不能与放置在不可击四元之上的其他四角体批处理。

考虑三个四角体,A、B和C的情况。假设这三个四角体相互重叠,也假定四角体A和C使用相同的材料,而四元B使用单独的材料。因此,四B不能用A或C批次。

如果层次结构中的顺序(从上到下)为A、B、C,则无法对A和C进行批处理,因为B必须在A顶部和C下方绘制。但是,如果B放置在可分批四元数之前或之后,则可以对可批处理的四元数进行批处理--B只需要在批处理的四元数之前或之后绘制,并且不插入它们。

以进一步讨论这一问题, see the Child order section of the Canvas chapter.

Instruments & VTune

Xcode’s Instruments and Intel’s VTune allow for extremely deep profiling of Unity UI rebuilds and Canvas batch calculations on Apple or Intel CPUs, respectively. The method names are nearly identical to the profiler labels discussed above in the Unity Profiler section:

  • Canvas::SendWillRenderCanvases is the C++ parent that calls the Canvas.SendWillRenderCanvases C# method and governs that line in the Unity Profiler. It will contain the code used to run the Rebuild process, as described in the previous chapter.

  • Canvas::UpdateBatches is identical to Canvas.BuildBatch, but includes additional boilerplate code not covered by the Unity Profiler label. It runs the actual Canvas Batch Building process, described above.

When used in conjunction with a Unity app built via IL2CPP, these tools can be used to drill down deeper into the transpiled C# code of Canvas::SendWillRenderCanvases. Of primary interest will be the cost of the following methods. (Note: transpiled method names are approximate.)

  • IndexedSet_Sort and CanvasUpdateRegistry_SortLayoutList are used to sort the list of dirty Layout components before the layouts are recalculated. As described above, this involves calculating the number of parent transforms above each Layout component.
  • ClipperRegistry_Cull calls all registered implementers of the IClipRegion interface. Built-in implementers include RectMask2D, which uses the IClippable interface. During ClipperRegistry.Cull calls, RectMask2D components loop over all clippable elements contained within their hierarchy and asks them to update their culling information.
  • Graphic_Rebuild will contain the cost of actually calculating the meshes needed to represent Image, Text or other Graphic-derived components. Beneath this will be several other methods like Graphic_UpdateGeometry and, most notably, Text_OnPopulateMesh.
    • Text_OnPopulateMesh is generally a hotspot when Best Fit is enabled. This is discussed in more detail later in this guide.
    • Mesh modifiers, such as Shadow_ModifyMesh and Outline_ModifyMesh, will also run here. The cost of calculating component drop shadows, outlines and other special effects can be seen via these methods.

Xcode Frame Debugger & Intel GPA

Low-level frame debugging tools are essential for profiling the cost of individual portions of the batched UI as well as monitoring the cost of UI overdraw. UI overdraw is discussed in more detail later in this guide.

Using the Xcode Frame Debugger

To test whether a given UI is overstressing the GPU, Xcode’s built-in GPU diagnostics tools can be employed. First, configure the project in question to use Metal or OpenGLES3, then make a build and open the resulting Xcode project. Some Xcode version and device combinaisons may support OpenGLES 2 frame captures, but there’s no guarantee it will work.

Note: On some versions of Xcode, it is necessary to select the appropriate Graphics API in the Build Scheme in order to make the graphics profiler work. To do this, go to the Product menu in Xcode, expand the Scheme menu item, and choose Edit Scheme.... Select the Run target and go to the Options tab. Change the GPU Frame Capture option to match the API used by your project. Assuming the Unity project is set up to automatically select a graphics API, then most modern iPads will default to using Metal. If in doubt, start the project and look at the debug logs in Xcode. One of the early lines should indicate which rendering path (Metal, GLES3 or GLES2) is being initialized.

Build and run the project on an iOS device. The GPU profiler can be found by showing the Debug pane in Xcode’s Navigator sidebar, and clicking on the FPS entry.

The first point of interest in the GPU profiler is the set of three bars in the center of the screen, labeled “Tiler”, “Renderer”, and “Device”. Of these two:

  • “Tiler” is generally a measure of how stressed the GPU is by processing geometry, which includes time spent in vertex shaders. Generally, a high “Tiler” usage indicates either excessively slow vertex shaders or an excessive number of vertices being drawn.
  • “Renderer” is generally a measure of how stressed the GPU’s pixel pipelines are. Generally, high “Renderer” usage indicates that an application is exceeding the maximum fill-rate of the GPU, or has inefficient fragment shaders.
  • “Device” is a composite measure of overall GPU usage, which includes both “Tiler” and “Renderer” performance. It can generally be ignored, as it will roughly track the higher of the “Tiler” or “Renderer” measurements.

For more information on Xcode’s GPU profiler, see this documentation article.

Xcode’s Frame Debugger can be triggered by clicking on the small ‘Camera’ icon hidden at the bottom of the GPU profiler. It is highlighted by an arrow and a red box in the following screenshot.

After a brief pause, the Frame Debugger’s summary view should appear, like so:

When using the default UI shader, the cost of rendering geometry generated by the Unity UI system will show up under the “UI/Default” shader pass, assuming the default UI shader has not been replaced with a custom shader. It is possible to see this default UI shader in the above screenshot as Render Pipeline “UI/Default.”

Unity UI only generates quads and so the vertex shader is unlikely to stress the tiler pipeline of the GPU. Any problems that appear in this shader pass are likely due to fill-rate issues.

Analyzing profiler results

After gathering profiling data, several conclusions might be drawn. If Canvas.BuildBatch or Canvas::UpdateBatches seems to be using an excessive amount of CPU time, then the likely problem is an excessive number of Canvas Renderer components on a single Canvas. See the Splitting Canvases section of the Canvas chapter.

If an excessive amount of time is spent drawing the UI on the GPU, and the frame debugger indicates that the fragment shader pipeline is the bottleneck, then the UI is likely exceeding the pixel fill rate which the GPU is capable of. The most likely cause is excessive UI overdraw. See the Remediating fill-rate issues section of the Fill-rate, Canvases and input chapter.

If Graphic Rebuilds are using excessive CPU, as seen by a large portion of CPU time going to Canvas.SendWillRenderCanvases or Canvas::SendWillRenderCanvases, then deeper analysis is needed. Some portion of the Graphic Rebuild process is likely responsible.

In the case that a large portion of WillRenderCanvas is spent inside IndexedSet_Sort or CanvasUpdateRegistry_SortLayoutList, then time is being spent sorting the list of dirty Layout components. Consider reducing the number of Layout components on the Canvas. See Replacing layouts with RectTransforms and Splitting Canvases sections for possible remediations.

If excessive time seems to be spent in Text_OnPopulateMesh, then the culprit is simply the generation of text meshes. See the Best Fit and Disabling Canvases sections for possible remediations, and consider the advice inside Splitting Canvases if much of the text being rebuilt is not actually having its underlying string data changed.

If time is spent inside Shadow_ModifyMesh or Outline_ModifyMesh (or any other implementation of ModifyMesh), then the problem is excessive time spent calculating mesh modifiers. Consider removing these components and achieving their visual effect via static images.

If there is no particular hotspot within Canvas.SendWillRenderCanvases, or it appears to be running every frame, then the problem is likely that dynamic elements have been grouped together with static elements and are forcing the entire Canvas to rebuild too frequently. See the Splitting Canvases section.

猜你喜欢

转载自blog.csdn.net/Momo_Da/article/details/93532474
今日推荐