Performance - Performance analysis and optimization in UE4

Excerpts and notes of "Unreal Indie Development Day 2019: Performance Analysis and Optimization in UE4", archived and published;

Performance analysis and optimization in UE4 :

The earlier the optimization related work is, the better;

The optimization first judges the performance bottleneck of GPU or CPU:

The stat unit command shows the rendering time of each frame;

When evaluating performance, try to avoid performing performance analysis in the editor, it is best to do debugging on the actual running platform; if you are developing a PC game, you must also remember to run it in Stand alone mode when you must debug in the editor (see the picture above for specific precautions);

The above threads run in parallel, but rely on the calculation results of the previous thread;

The Game thread calculates all game logic, data, etc. These results and data are used by the Draw thread to calculate the content that does not need to be rendered, and finally the GPU thread will actually render the final pixels on the screen; the so-called bottleneck may be that a certain thread is delayed for too long, causing the next thread to wait;

The role of each thread;

The StartFPSChart and StopFPSChart commands can obtain the output of the stat unit and record it to a text file in CSV format;

It is also useful to analyze the CPU with the stat startfile and stat stopfile commands, and relevant analysis files will be generated;

The Unreal Insights tool is similar to Profiler, but it is a stand-alone program;

Game thread :

All calculations related to game logic are performed on the CPU;

Usually, the culprit of the performance problem in the Game thread is the complex logic in the Tick event (frame event); if many Actors are using the Tick event in the game scene, it is likely to seriously drag down the game performance;

The stat game command can display the time-consuming per frame of the game logic in a specific situation, and the dumpticks command can list the Actors that are Ticking;

When the Tick event must be used to calculate complex logic, the Tick cycle can be reduced by using a timer or reducing the frequency of Tick calls, or the Tick of an Actor that is too far away from the player can be disabled;

Do some simple fade-in and fade-out effects through the material, and put the load on the GPU side;

Animation logic that has little to do with gameplay can be implemented using materials;

Some functions with relatively high performance overhead; for example, Get all actors of class can be called at the beginning of operation, and then store related data in an array for later use;

When the logic in the Tick event is quite complicated, you can consider using C++; there are related functions in UE4 that can convert Blueprint to C++;

UE4 can use mixed programming, so the most complex functions can be converted to C++ and exposed to blueprints;

Remember to use the Fast Path function in the animation blueprint;

Draw thread :

The limit rendering of general engines is 10~15k objects;

The cylinders in the above picture are different objects, and the numbers represent the number of Draw Calls . One of the cylinders on the right uses two different materials, so it takes an extra Draw call;

The problem of Draw Call has a great impact on performance. In addition to related command lines, open source tools such as RenderDoc can also help analyze Draw call;

Compared with the impact of polygons on performance, Draw calls have a much greater impact on performance;

The method of reducing Draw calls is relatively common, such as using the merged large model to replace a large number of small models; but it will bring some side effects, such as affecting the culling calculation, etc.;

Of course, using LOD can also help improve Draw calls;

Although it is convenient to build levels modularly, it will also increase draw calls. Pay attention to merging models at any time;

Instanced rendering can also reduce draw calls. For example, the vegetation system is automatically instanced and rendered. Other types of models require some manual settings;

The principle of HLODs function is similar to that of Merge Actor, but the difference is that HLODs are merged automatically and are non-destructive. For example, HLODs are generated after baking a group of objects. Although they will be switched to baked HLODs at a distance like LOD, each object can still be adjusted individually during editing (of course, HLODs must be re-baked afterwards);

GPU threads :

The GPU thread will eventually draw pixels on the screen; the easiest way to find performance problems at this stage is to turn off various features with various commands;

ProfileGPU can not only be called in the editor, but also can generate related files in the development version;

There are also related commands that show the number of draw calls per material;

The main way to solve Overshading is to use LOD;

View Overdraw in the editor using the relevant mode;

Shader Complexity mode is also very important;

Some techniques to reduce the complexity of Shader; use Feathure level switch to switch Shader code for different platforms;

Pay attention to moving some complex operations to VS to calculate;

The particle system must use the Particle Cutout function, which can automatically crop the image to make it closer to the Alpha channel and reduce Overdraw; but for vegetation, you need to manually crop the model yourself (refer to making high-quality vegetation );

Lighting complexity is also an important part of optimization;

Finally, some optimization suggestions for lighting;


Guess you like

Origin blog.csdn.net/DoomGT/article/details/124551188