ue4 optimization tips and experience

Transfer: https://dawnarc.com/2016/12/ue4%E4%BC%98%E5%8C%96%E5%BB%BA%E8%AE%AE%E4%B8%8E%E7%BB% 8F% E9% AA% 8C /

Content is handled issues related to the project notes, memo left to their own, but also to let others share out detours.

Scattered records
  1. GPUProfile time performance statistics to consumption, is not very accurate in the editor mode, because the editor consumption also enumerated, and if you do, preferably in Game mode to view.

  2. UE4 does not support 640X480 resolution, if you run the program under this resolution will cause the program to crash (version 4.4, the latest version of I do not know whether there is still the problem).

  3. If the character who has a lot Component needs Attach, as far as possible in the use of Attach, do a load on all attach, or when the scene in many roles, there will be a serious performance problem.
    For example: There are hundreds of characters in the scene, but not every character needs a camera and a spring arm, then do not create the camera and spring arm assemblies in the constructor.

  4. Sides UE4 insensitive, even in the mobile terminal. On ipad 4, 50 million for triangles, it is possible to stabilize operation of the frame rate to 30fps, the mobile terminal mainly Map Size, complexity of the material is sensitive.

Code compiler optimization
  1. C ++ blueprint faster than 100 to 1000 times
    [the Test] C ++ the Blueprint VS VS Nativized the Performance on BP
    https://www.reddit.com/r/unrealengine/comments/6qtxy3/test_blueprint_vs_c_performance_vs_nativized_bp/
    thanks users know almost 金木研correction: The above results are in the editor the test results in mode (not native blueprint), and if packaged blueprint converted to native code, then the performance gap between the blueprint and C ++ is not more than twice. Prior to about 4.18 (specific version forget), a blueprint transfer native code algorithm is not optimal, such as an addition operation, will be translated into a single function, which leads to the execution stack particularly long, later Epic of blueprints turn native algorithms continuous improvement, the current version has been optimized enough.

  2. When vector transform in C ++, to make use of FTransform::TransformXXX()and FTransform::InverseTransformXXX, instead FQuat::RotateVector, and FQuat::UnrotateVector, because the former uses more current hardware support vector assembler instructions, drained hardware performance, and which is designed to cross-platform C ++ code to use honestly performing calculation formula, although called hardware assembler instruction, but a relatively small number.
    UE4 optimized, when you use FTransform::TransformXXX(), if the current hardware support, and left hardware instructions, if it does not, go FQuat::RotateVector.

  3. VS2019 for C ++ code compilation speed, and CPU AVX/AVX2vector instruction set computing at a deeper level of optimization, and Microsoft's Xbox ATG engineering team using UE4 demo of Infiltrator effect optimized benchmarked:

    • Compilation speed: full compilation speed, VS2019 (16.2) is VS2017 (15.9) of 3.5 times, incremental compilation speed, VS2019 (16.2) is VS2017 (15.9) 1.6 times;
    • Code optimization: the frame rate of the game as a test standard, VS 16.2 16.0 relatively improved 2% to 3%, relative to 15.9 and 16.0 can increase the maximum 2.8%, which means that the code compiled using VS2019, compared to VS2017, you can let the game run when the frame rate up about 5%.

    For more details, see Microsoft C ++ team blog: C ++ Team Blog .

Code optimization algorithm (only UE4 API-related)
  1. Empty TArray, if the object is TArray will continue to use, use Reset()in place Empty(), because the former does not destroy memory space.

  2. When TArray remove elements, if the order of the elements that are not of interest, may be used RemoveAtSwap()instead of RemoveAt()the former is used to fill the end of the array elements of the memory hole (invalid memory space created after removing the elements), which are empty after all of the elements translation. Time complexity, the former is O (RemovedCount), which is O (ArrayNum).

  3. TSoftObjectPtrInternal use of RTTI dynamic_cast, this operation is very expensive, to avoid large-scale use at run-time, especially in the service end. If you have to use frequently in run-time, individuals do: build their own map as a cache resource load management, key resource path, value is a self-definition of the structure, which holds pointers to objects and resource type enumeration , when it is retrieved based on enumeration value static_caststrong turn.

Horizon cut (removed)
  1. Open Occlusion Culling (Project Settings -> Engine -> Rendering -> Occlusion Culling, is turned on by default). If necessary to increase the proportion of the screen to remove the reference intensity (excluding the cost effect is unexpected) to improve rendering efficiency, increase the following property values:
    Min Screen Radius for Lights
    Min Screen Radius for Early Z Pass
    Min Screen Radius for Cascaded Shadow Maps

  2. Use Cull Distance Volumeconduct horizon crop fine-grained. The Project Settings Occlusion Cullingcan only be controlled by a critical threshold cut-sight, and Cull Distance Volumecan set up multi-level cropping parameters.

Lighting optimization
  1. Activity of three light consumption from low to high:
    directional light / parallel light (Directional Light) <point light source (Point Light) <spotlight (Spot Light).
    When the number of light sources in the scene reaches a certain magnitude, the performance gap between the three kinds of light it is on the order of the gap.

    Point Light, and Spot Light in the end consumption is higher or lower, the UE4 official document seemingly did not find a clear explanation. Are two possible use scenarios under different lighting, contrast is not the same consumption. Unity early official document shows two lighting consumption on the GPU instructions:
    Point Light:. They have have ON AN Average cost The Graphics Processor (though Point Light Shadows are expensive The MOST)
    Spot Light: They are expensive The ON MOST The graphics processor.
    without considering other factors such as the cost of memory, consider a single GPU consumption, Spot Light expensive than Point Light.

    Early Unity Document: Light Light
    http://www.ceeger.com/Components/class-Light.html
    thanks users know almost 刘相敬guidance to the recommendations:
    attenuation point light source is calculated without taking into account shadows much simpler than the spotlight, and only related to the distance, to calculate the distance attenuation and spotlights and outside attenuation angle with cos sin instruction, the cost will be much larger, relatively speaking, but in actual use the illuminated area much smaller than the spot light source, rather than consume the vast majority of scene low point source. Of course, this is based on lighting and ClusterBased delayed rendering for cutting, the actual rendering Unity Forward light source is not cut, all the pixels are calculated again and the point light source illumination spot, so that spotlights consumption is considerably greater than the point light sources (Forward Add rendering), the default lighting inside and outside corners Unity is not one that relies on a map do attenuated analog, so Unity spotlights more than one sampling point source link.

     

  2. When constructing light map, if the scene does not give Lightmass Importance Volume, would do the whole scene samples indirect lighting, produce Indirect Lighting Cache, which scenario for the big game is quite a waste, like a game character can not get in, vision Indirect lighting Cache need not be generated, this time can be inserted in the scene Lightmass Importance Volume, Indirect lighting Cache will produce a specific area specified, saving a lot of time to construct the illumination.

  3. Point light source and try not to turn the spotlight Cast Volumetric Shadow; default only parallel light is turned on this option. After the opening performance of consumption is not open to three times the performance of consumption. Do not open the hatching of the calculation using the Shadow Mappingopen indication Shadow Volume, the shadow of the former without the latter is calculated precisely, but a small amount of calculation.

  4. If the open volume of the fog, the recommended static light into the light, so that when Build Lighting precomputed generated fog volume data, the volume of which can significantly improve the performance of mist. Volume fogging properties huge consumption.

  5. If the scene is not a static light Static Light (full dynamic light or fixation light Movable Light Stationary Light), will have to disable Static Lighting, to save on overhead associated Static Lighting (such LightMaps and ShadowMaps correlation calculation). Disable way: Project Settings -> Engine -> Rendering -> Lighting -> disable  Allow Static Lighting.

    When the light is fully dynamic performance bottleneck, disabling Static Lighting can improve performance. Test case: one of my game scene, Lighting is one of the bottlenecks, r.ScreenPercentage modification of 400 stress testing, close the Static Lighting frame rate up after a 20. Because there is no static light, after disabling lighting effects nor any loss.

     

  6. Close Support Global clip plane for Planar Reflections, off by default, after opening the huge consumption.

  7. AO performance optimization. In very large scene, light typically will be one performance bottleneck, especially dynamic light scenes. At this time, a substantial increase in frame rate can be close AO (AO is enabled by default, the earlier version is off by default). After (Project Settings -> Engine -> Rendering -> Default Settings -> open AO  Ambient Occlusion), AO is the default engine SSAO (Screen Space Ambient Occlusion), SSAO can not be pre-computed, so the GPU performance overhead is large, can be modified to DFAO (Distance Field Ambient Occlusion) to improve performance, because DFAO can be pre-computed, the cost of increased memory overhead.
    DFAO open way:
    Distance Field, Ambient Occlusion
    https://docs.unrealengine.com/en-us/Engine/Rendering/LightingAndShadows/DistanceFieldAmbientOcclusion
    DFAO related to the optimization of two options:

    • Compress Mesh Distance Fields: by compression Distance Fields volume textureto reduce memory usage, the cost will occur when using the Level Streaming Hitch
    • Eight Bit Mesh Distance Fields: a Distance Fields volume texturecompression format from the 16-bit to 8-bit format, at the cost of visual effects thicker AO dry.
  8. If there are a large number of point light source and a spotlight in the scene are dynamic and, at this time by Distance Fielddynamically LightComponet performed SetVisibilityand SetHiddenInGame, then performance can be increased by 30% to 60%. This conclusion is based on a paid official store plug-in  Dynamic Lighting Portal System (Performance Booster)  source study. There are lights on the engine itself Occlusion Culling, as to why SetVisibility, and SetHiddenInGamethen still have such a big performance boost, estimates need to be carefully studied to render the relevant code UE4, immature personal speculation: to deferred shading, for example, when the pixels in the image space processing of each light source, even if a light source does not have a significant impact on the current pixel, but still corresponding calculation process is performed, may render layers and not due to the upper application program or video game animation rendering and optimized to do it, and this plug-in and hidden by the lights when disabled, then the lighting information when processing the respective pixel, the calculation logic to skip the source, it is possible to greatly improve performance.

Shadow optimization
  1. If you use a non-static Directional Light (Stationary or Movable), when a large number of units in the scene, be sure to turn on Dynamic Shadow Distance(the default is 0, meaning closed).
    Test: Actor with the screen 500, camera height 4000, Directional Light Stationary type of the attribute Dynamic Shadow Distance StationaryLightvalue is greater than the linear distance from the camera to the Actor (Note: is the linear distance of each Actor, the value to be set larger as much as possible such as 5000), or the framerate dropped from 200 fps to 100 fps.
    Dynamic Shadow DistanceAfter opening the cause can improve performance:
    Dynamic Shadow Distance indicate how much the use of dynamic shadows in the distance, over this distance than the static Fade into shadow, and Fade into shadow after you can still improve performance.
  2. Cast Shadow logic control
    while lighting attributes provided DistanceField Shadow Distanceto control the distance from the camera according to the shadow casting, but this practice across the board. For example: Suppose the performance bottleneck is the shadow cast by a large number of monsters, distant mountain shadow casting trees and buildings little effect on performance, this time using DistanceField Shadow Distanceit will lead to performance scene ineffective. The recommended approach is to control the program logic: if the object is a monster, only to turn away from the shadow of the monster within a certain distance of the camera.
    Objects that cast shadows switch:

    void UPrimitiveComponent::SetCastShadow(bool NewCastShadow)
    
  3. Enable dynamic light after consuming enormous, if both want to enable dynamic light, want to ensure the performance, you can lower levels of shading, the default is Level 3 (Epic), it can be changed to Level 2 (High).

Material optimization
  1. Material type of performance, from fast to slow: Opaque -> Masked -> Translucent.

  2. If the scene had a lot of units, such as 500, these units must be done material LOD, and remove as much of translucent materials (such as direct remove the translucent effect in the last two stages), otherwise the performance of consumption is growing exponentially.

  3. If GPUVisualizer of BasePassconsuming high, so in large part because the material high complexity.

  4. Decal consumption and the number of pixels related to program functions Never mess with decals, except art shop scene. For example, the program would like to make a range of decals mark, when the mark if the range is large Never use decals can be changed or crossed impermeable texture. If the scene requires extensive use of decals, decals dynamically created and destroyed in accordance horizon, just SetVisibilityis not enough, there will still be hidden after the huge overhead (though it may be in the terrain editor to edit this piece there are bug, because there UE4 scene editor many bug, especially when the engine upgrade version, the old version of the terrain may create a variety of inexplicable bug) appear in the new version.

  5. Types of materials in the scene to plan well in advance, select only when the material in the planned fight scene. If the types of materials with large screen, will increase the draw call. Especially the art scene with a patchwork of online material, can easily lead to a sharp rise in the number of types of materials.

Performance Guidelines for Artists and Designers
https://docs.unrealengine.com/latest/INT/Engine/Performance/Guidelines/

Vegetation Optimization

  1. When the terrain editing, use Instanced Static Meshes. Intancing will increase the cost of the GPU, but can significantly reduce the CPU overhead. Note: The actual applications, Instancing and can not serve as the main way to reduce the number of CPU draw call, as the actual game scenes can not all be instanced mesh, even a full screen of vegetation, nor must use instanced mesh. To reduce the number of draw call, need to reduce the types of materials to improve material reuse rate.

  2. When a large number of Instanced Mesh (such as one million), the implementation of a frame RemoveInstanceor UpdateInstanceTransformseveral times, the frame rate will slumped.
    Optimization approach: Before Instanced Mesh, will be UHierarchicalInstancedStaticMeshComponent::bAutoRebuildTreeOnInstanceChangesset to false, then Instanced Mesh perform various operations you need to play after the operation, and then bAutoRebuildTreeOnInstanceChangesset to true, then execute BuildTreeIfOutdated(true, false);, which can significantly reduce the operation which led to one million Instanced Mesh performance loss.

  3. If the vegetation material consumption becomes a bottleneck, would rather increase the number of faces, do not use Translucent material, Masked use as appropriate. Sheet such as a grass surface, the entire shape of triangles all use to spell out, rather than using a cam face or Translucent materials plus Mask manner.

  4. Instanced Mesh is set right Cull Distance.

Physics and collision optimization
  1. BoxComponent the Generate Overlap Events is set to false. If you do not Overlap event, then the property is set to false, default is true. When BoxCompont reach a certain order of magnitude, open Generate Overlap Events consumption is twice the performance under the closed case.

  2. If no physical, will  Simulate Physics set to false.

  3. If you do not Hit event,  Simulation Generates Hit Events set to false.

  4. If the objects in the scene type (WorldStatic, WorldDynamic, Pawn etc.) a lot, and the number of each lot, the Object Response Collision of the better channel settings, the channel may be provided Ignore Ignore are set to. If the type of objects in the scene is relatively simple, even though this type of objects in the scene have hundreds, Object Response Block or even set to the Overlap, it has no effect on performance.

  5. If large-scale RTS game, the scene when there is a mass units (such as large-scale StarCraft 2 Zerg puppy), the Collision UE4 can not not use Collision, otherwise the number of frames slumped.
    Own proposals to achieve a simple custom Collision, Collision such as spherical, then calculates the linear distance between the unit and the Collision to determine whether whether a collision, and reduce the detection interval, such as 0.1 second. By this way, if the large number of units, but also need to write a similar Distance Filedoctree to cache the list of units to reduce the number of loops through the list of units in the calculation unit interval.

Animation Optimization
  1. Open the role blueprint - "MeshComponent -" under Optimization category in the Detail panel - "check  Enable Update Rate Optimizations.

  2. Tick ​​and RefreshBoneTransforms performed only for rendering SkinnedMesh

    USkinnedMeshComponent::VisibilityBasedAnimTickOption = EVisibilityBasedAnimTickOption::OnlyTickPoseWhenRendered;
    

    Default AlwaysTickPoseAndRefreshBones, regardless of whether the representation is rendered (in the visible region), perform Tick and RefreshBoneTransforms.
    VisibilityBasedAnimTickOptionInitially known SkinnedMeshUpdateFlag, into a version prior to 4.21 MeshComponentUpdateFlag, 4.21 start is called VisibilityBasedAnimTickOption.

    If you turn off the animation Tick, Tick event logic fail within the animation blueprint; if you close RefreshBoneTransforms, the bones will transform the logic of failure, for example Transform (Modify) Bone. AnimNotify this option is not affected.

     

  3. Animated blueprint of logic try to directly access member variables, the engine is enabled by default optimization options: member variables animated blueprint is copied to Native Code at compile time, so as to avoid entering the blueprint for the virtual machine (Blueprint Virtual Machine) at runtime execution blueprint Code, because of the low blueprint VM operating efficiency.
    The default is compiled optimized parameter types include:
    Member the Variables;
    Negated boolean Member the Variables;
    Members of A nested Structure;
    specific instructions, see the official documentation:
    Animation Optimization
    https://docs.unrealengine.com/en-us/Engine/Animation/ Optimization
    Animation Fast Path Optimization
    https://docs.unrealengine.com/en-us/Engine/Animation/Optimization/FastPath

UI optimization
  1. HUD can be resolved not use UMG, wait until you need to create a Widget display objects, the destruction is not displayed, consume huge performance when more UMG object.
    For example, there is a scene within one thousand units, each unit is created for WidgetComponent, even if they do not show anything WidgetComponent, GPU will have a huge overhead.

  2. UMG not be used to modify the mouse cursor, because UMG to produce a higher response speed of display logic, there will be visible a significant delay (UMG shows how high performance consumption), may be used instead of UMG produced Hardware Cursors cursor .

Epic Games engineers share: How do UI optimized for UE4 on a mobile platform?
http://youxiputao.com/articles/11743

Optimization of displacement
  1. The Pawn mass (such as 500) mobile units, if it is used in a mobile AddMovementInput Tick, the halved frame rate directly (such as down from 90 to 40 multiframe). For the unit can not be moved, it is best to stop execution AddMovementInput (), in order to improve performance.
Special effects Optimization
  1. Try not to use Volume domain, will significantly increase the GPU overhead after use. Can  profilegpu detect Volume overhead.
AI Optimization
  1. If the role does not require Controller, do not give it Spawn Controller. If a character for a long time to stop, then give him Unpossesed() until the moveable again PossessedBy().
    Test: 500 characters, AI Controller Class set: null, AIController, PlayerController frames were 120 fps, 100 fps, 75 fps .
Dedicated Server Optimization
  1. Peel animation data server Cook
    Project Settings -> Engine -> Animation -> Check Strip Animation Data on Dedicated Server.
    If you added Notify Event trigger modify the data in the animation, check this option there will be problems. Make sure that the animation mounted Notify only performance-related, not involved in the game logic.

  2. Disable Server role under physical simulation mode
    FBodyInstance->bSimulatePhysics is set to false. The default is false.
    SkeletalMeshComponent::bEnablePhysicsOnDedicatedServer Set to false, default is true. But this will lead to physical checking with clients prevail, there is the risk of plug-hack. bEnablePhysicsOnDedicatedServer at run-time changes do not take effect.

  3. Disable Collision Under Server mode
    UPrimitiveComponent->bGenerateOverlapEvents set to false, the role blueprint CollisionComponent default is true.

  4. Detach the decorative role of the body all the components under the Server mode.

  5. AnimInstance is Root Motion Modenot modified to Root Motion from Everytingmake use of default values Root Motion from Montage Onlyto reduce the amount of computation servers synchronized animation.

  6. 4.20 provides new features optimized for Dedicated Server: Replication Graph, may be a little early to be understood as the LOD for network communication (but Replication Graph provides more than the network level LOD, more features, see the official documentation). Before Without this feature, the server calls Multicast function, the player will also receive a few kilometers away Multicast, and in fact this long-distance players may not need immediate updating data, or wait until the line of sight within the time manually ForceNetUpdate(). With Replication Graph later, this manual optimizations to the engine can manage their own.

Reference material

Performance and Profiling
https://docs.unrealengine.com/en-us/Engine/Performance

Achieve good performance and high-quality visual effects in UE4 by optimizing
http://gad.qq.com/program/translateview/7160166

CPU Profiling
https://docs.unrealengine.com/en-us/Engine/Performance/CPU

GPU Profiling
https://docs.unrealengine.com/en-us/Engine/Performance/GPU

Unreal Engine 4 Optimization Tutorial, Part 1-4
https://software.intel.com/en-us/articles/unreal-engine-4-optimization-tutorial-part-1
https://software.intel.com/en-us/articles/unreal-engine-4-optimization-tutorial-part-2
https://software.intel.com/en-us/articles/unreal-engine-4-optimization-tutorial-part-3
https://software.intel.com/en-us/articles/unreal-engine-4-optimization-tutorial-part-4

Optimizing and Profiling Games with Unreal Engine 4
http://vincentloignon.com/blog/optimizing-and-profiling-games-with-unreal-engine-4/

Dynamic Lighting Portal System (Performance Booster)
https://www.unrealengine.com/marketplace/dynamic-lighting-portal-system-performance-booster

Performance Optimization: Shadows Triggering Zones
https://www.unrealengine.com/marketplace/performance-optimization-shadows-triggering-zones

Unreal Insights(New feature in v4.23)
https://www.youtube.com/watch?v=TygjPe9XHTw

Virtual Texturing(New feature in v4.23)
https://www.youtube.com/watch?v=fhoZ2qMAfa4

Guess you like

Origin www.cnblogs.com/sevenyuan/p/11842042.html