The underlying principles of Unity memory management you should know

This article is the first official account of Hongliu School . Hongliu Academy will make you a few steps faster.

This article is mainly the notes and related knowledge points supplemented by Unity’s official video "Talking about Unity Memory Management". If you have time, it is strongly recommended to study the original video:
https://www.bilibili.com/video/BV1aJ411t7N6

text

1. What is memory

Memory is divided into physical memory and virtual memory.

Regarding physical memory:

CPU access to memory is a slow process.

When the CPU needs to access the memory, it first accesses its own cache (L1Cache, L2Cache...), and when all misses, the CPU will go to the main memory to get a complete instruction into the CPU cache. Therefore, we need to ensure that the instructions of the CPU are continuous as much as possible to prevent excessive memory exchange between the CPU and the main memory to generate IO.

In order to deal with the above problems and reduce Cache Miss, Unity introduced ECS and DOTS to turn scattered memory data into a whole block of continuous data. (However, DOTS is currently 20210513unstable, and the API between versions may change a lot and be incompatible. It is risky to use it in product-level projects)

About virtual memory:

When the physical memory of the computer is not enough, the operating system will swap some unused data (DeadMemory) to the hard disk, which is called memory swapping.

However, mobile phones do not perform memory swapping . First, because the IO speed of the hard disk of mobile devices is much slower than that of PCs. Second, because the hard disks of mobile devices can be erased and written less times; therefore, if the mobile phone does memory swapping, it will be slower, but less. It depends on the life of the device, so there is no memory swap on the Android machine. IOS can compress inactive memory to make more memory actually available, but Android does not have this ability.

For mobile and PC:

The difference between a mobile device (mobile phone) and a PC is that the mobile phone does not have an independent graphics card or independent video memory. Both the CPU and the GPU on the mobile phone share the same cache, and the memory of the mobile phone is smaller, the number of cache stages is less, and the size is smaller. The L3 cache of the desktop is about 8~16M, while that of the mobile phone is only 2M.

To sum up, the memory on a mobile phone is much smaller than that of a PC no matter from which point of view. Therefore, the problem of insufficient memory is more likely to occur on the mobile phone.

2. Android memory management

Android is developed based on Linux, so Android's memory management is very similar to Linux.

The basic unit of Android memory management is Page (page), generally 4k a Page. The reclamation and allocation of memory are performed in units of Page, which is 4k. Android memory is divided into two parts: user mode and kernel mode. The memory in kernel mode is strictly inaccessible to users.

About Memory Killer: Low Memory Killer (LMK)

When the memory usage of the mobile phone is too much, LMK will appear to shut down various Apps and services of the current mobile phone. The various applications and services of Android are divided into the following categories:

0. Native: system kernel

1. System: system service

2. Persistent: User services, such as telephone, Bluetooth, Wifi, etc.

3. Foreground: the foreground application, the Activity currently in use

4. Perceptible: auxiliary applications, music, search, keyboard, etc.;

5. Service: services in the background thread, cloud synchronization, garbage collection, etc.;

6. Home button;

7. Previous: the last application used;

8. Cached: background, various applications that have been used before.

This is also the application priority ranking of the Android system, the smaller the number, the higher the priority. When LMK starts working, it will start Killing from the application with the lowest priority. That is, the various Cached are interrupted first, and the Native is finally reached.

For example, after Cached is killed, the phenomenon is that when you switch to those applications in the background, you will find that those applications have restarted.

When Home is killed, you find that when you return to the desktop, the desktop will restart, your desktop icons will be rebuilt, or the wallpaper will be gone.

When it's Perceptible, your music and keyboard may be gone.

Going further up, when you reach the Foreground, the current foreground application will be killed, and the application will flash back at this time.

After going up, the phone started to restart.

3. Android memory indicators

RSS:Resident Set Size

All memory used by your current APP. In addition to the memory used by your own APP, the memory generated by various services and shared libraries you call will be counted in RSS.

PSS:Proportional Set Size

Different from RSS, PSS will spread the memory used by the public library to all apps that call this library. (Maybe your own application does not apply for a lot of memory, but a public library you call already has a large memory allocation, which will lead to an artificially high PSS of your own APP.)

USS:Unique Set Size

Only the memory used by this APP, excluding the memory allocation of the public library.

What we need to do more in actual work is to optimize the USS, and sometimes pay attention to the PSS.

4. About Unity memory

Classification of Unity memory

Unity memory is divided into Native Memory and Managed Memory (managed memory). It is worth noting that the memory allocation of Unity under the Editor and under the Runtime is completely different. Not only the size of the allocated memory will be different, but also the size of the memory seen in the statistics, and even the timing and method of memory allocation are also different.

For example, an AssetBundle will be loaded into the memory as soon as you open Unity in the editor, but it will be loaded when you use it in the Runtime. If it is not read, it will not be loaded into the memory. (After Unity2019, some Asset import optimizations have been done, and unused resources will not be imported). Because Editor does not pay attention to the performance of Runtime, but pays more attention to the smoothness of editing in the editor.

But if the game is as large as dozens of G, if you open the project for the first time, it will consume a lot of time, and some large ones will take a few days or even a week.

Unity's memory can also be divided into engine-managed memory and user manager memory. The memory managed by the engine is generally inaccessible to developers, while the memory managed by the user is the relationship and priority of the user.

There is also a memory that Unity cannot monitor : the Native memory memory allocated by the user cannot be monitored by Unity's Profile tool. For example:

  • For Native plug-ins (C++ plug-ins) written by myself, Unity cannot analyze how the compiled C++ allocates and uses memory.
  • Lua manages memory entirely by itself, and Unity cannot count internal usage.

5. Best Practice Native Memory

Unity overloads all memory allocation operators (C++ alloc, new). When using these overloads, an additional memory label (Profiler-shaderlab-object-memory-detail-snapshot, the name inside is label: Refers to which type of pool the current memory is allocated to)

  • When using the overloaded allocator to allocate memory, the Allocator will allocate it to different Allocator pools according to your memory label, and each Allocator pool will do its own tracking. Therefore, when we go to the pool under the Runtime get memory label, we can ask the Allocator how many megabytes there are in it.
  • The Allocator is created in NewAsRoot (Memory "island" (inaudible)). There will be many sub-memory under this Memory Root: Shader: When we load a Shader into the memory, a Shader root will be generated. There are a lot of data under the Shader: sub shader, Pass, etc. will be allocated sequentially as members of the memory "island" (root). Therefore, when we finally count the Runtime, we will count the Root, but not the members, because there are too many to count.
  • Because it is C++, when we delete or free a memory, it will immediately return the memory to the system, which is different from the managed memory heap.

Scene

  • Unity is a C++ engine, and all entities will eventually be reflected in C++, not in the managed heap. Therefore, when we instantiate a GameObject, one or more Objects will be constructed at the bottom of Unity to store the information of the GameObject, such as many Components. Therefore, when the Scene has too many GameObjects, the Native memory will increase significantly.
  • When we look at the Profiler and find that the Native memory has increased a lot, we should check the Scene first.

Audio:

DSP Buffer: Equivalent to audio buffering.

  • When a sound is to be played, it needs to send a command to the CPU - I want to play the sound. But if the amount of sound data is very small, it will cause frequent instructions to be sent to the CPU, which will cause I\O.

  • When Unity uses the FMOD sound engine (the bottom layer of Unity also uses FMOD), there will be a Buffer. When the Buffer is full, the command "I want to play the sound" will be sent to the CPU.

  • DSP buffer can cause two kinds of problems:

    • If the (set) buffer is too large, it will cause a delay in the sound. It takes a lot of sound data to fill the buffer, but the sound data is not so large, so it will cause a certain sound delay .
    • If the DSP buffer is too small, the CPU burden will increase, and it will be sent when it is full, and the consumption will increase.

Foce to Mono : Force Mono

When the two channels are exactly the same, Force To Mono can be used to save half of the memory.
There is a setting when importing sound. Many sound effects engineers will set the sound to binaural for sound quality. But for 95% of the sound, the left and right channels put exactly the same data. This causes 1M sound to become 2M, which is reflected in the package body and memory. Therefore, for games that are not very sensitive to sound, it is recommended to change to Force to mono, forcing mono.

Format

For example, IOS has hardware solution support for MP3, so the analysis of MP3 will be much faster (Android does not).

Compressiont Format

The existence state of the sound file in memory (decompressed, compressed, etc.).

Code Size

The code also needs to be loaded into the memory. When using it, pay attention to reducing the abuse of template generics. Because when template generics are compiled into C++, the same code permutation and combination will be compiled, resulting in a significant increase in Code Size.

You can refer to Memory Management in Unity: https://learn.unity.com/tutorial/memory-management-in-unity 3. Generic Sharing part of IL2CPP & Mono.

AssetBundle

TypeTree

  • Each type of Unity has a lot of data structure changes. In order to be compatible with this, Unity will generate TypeTree by the way when generating data type serialization: which variables are used in my current version, and the corresponding data What is the type. When deserializing, it will be deserialized according to TypeTree.
    • If a type from the previous version doesn't have it in this one, TypeTree doesn't have it and thus won't touch it.
    • If you want to use a new type, but it does not exist in this version, it will be serialized with a default value, so as to ensure that there will be no errors in serialization of different versions. This is the role of TypeTree.
  • There is a switch in Build AssetBundle to turn off TypeTree. When you confirm that the current AssetBundle is exactly the same as the version of Build Unity, you can turn off TypeTree at this time.
    • For example, if you use the same Unity to type AssetBundle and APP, TypeTree can be completely turned off.
  • TypeTree benefits:
    • Reduced memory. TypeTree itself is data and also takes up memory.
    • The bundle size is reduced because the TypeTree is serialized into the AssetBundle bundle for easy reading.
    • Build and runtime will be faster. It can be seen in the source code, because every time something is Serialized, if it is found that Serialize TypeTree is needed, it will be Serialized twice:
      • Serialize TypeTree for the first time
      • Serialize the actual thing for the second time
      • Deserialization will do the same thing, 1. TypeTree deserialization, 2. actual stuff deserialization.
    • When you are sure that your current AssetBundle is the same version as your Unity, you can turn off TypeTree. Turning off TypeTree can reduce memory size, package size, and speed up operation.

Compression method: use Lz4 instead of Lzma

  • Lz4 (https://docs.unity3d.com/Documentation/ScriptReference/BuildCompression.LZ4.html)
    • LZ4HC “Chunk Based” Compression. Very fast
    • Compared with Lzma, the average compression ratio is 30% worse. That is to say, it will result in a slightly larger package body, but (the author says) the speed can be more than 10 times faster.
  • Lzma (https://docs.unity3d.com/2019.3/Documentation/ScriptReference/BuildCompression.LZMA.html)
    • Lzma is basically not used, because the decompression and reading speed will be slower.
    • also takes up a lot of memory
      • Because it is Steam based instead of Chunk Based, it needs a full decompression
      • Chunk Based can decompress piece by piece
        • If a file is found in blocks 5-10, then LZ4 will decompress blocks 5, 6, 7, 8, 9, and 10 in turn. Each (chunk) decompression will reuse the previous memory to reduce the peak value of memory.
  • There is LZ4-based Addressables (AssetBundle) encryption in the Chinese version of Unity, which only supports LZ4. https://mp.weixin.qq.com/s/s9lQyunpRPJZnnaLSb9qOQ

Size & Count

  • How big is the AssetBundle package is very metaphysical, but it is not good to pack a Bundle for each Asset.
    • There is a way to reduce the size of the picture, removing the heads of the png. Because the color palette of the header is universal, but the data is not. The same is true for AssetBundle, one part is its header, and the other part is the actual packaged part. Therefore, if each Asset is bundled, the header of the AssetBundle will be larger than the data.
  • The official suggestion is that each AssetBundle package is about 1M~2M in size, considering the network bandwidth. But now with 5G, you can consider increasing the size of the package appropriately. It still depends on the actual user situation.

Resources folder

Do not use, except when debugging

  • Resource, like AssetBundle, also has a header to index. Resource will create a red-black tree when it is packaged to help Resource retrieve where the resource is located.
  • If the Resource is very large, the red-black tree will also be very large.
  • Red-black trees are not unloadable. It will be loaded into the memory when the game is first started, which will continue to cause memory pressure on the game.
  • Can greatly slow down the game's startup time. Because the red-black tree has not been loaded, the game cannot be started.

Texture

Upload Buffer: It is very similar to the DSP Buffer of the sound. After setting the size of the fill, it is pushed to the CPU/GPU.

Read/Write : Close it when not in use.

  • Do not open read and write if Texture is not necessary. Normal Texture is read into the memory, after parsing, and put into the upload buffer, the memory will be deleted.
  • But if it is detected that you have turned on r/w, it will not delete, and there will be a copy in the video memory and memory.

Mip Map: Turn off unnecessary things like UI, which can save a lot of memory. .

Mesh:

Read/Write : Same as above Texture

Compression: Although it is written about compression, the actual effect may not be useful. In some versions, it is better not to enable Compression, and the memory usage may be more serious. You need to try it yourself.

6. Unity Managed Memory (managed memory):

VM memory pool

The memory pool of the Mono virtual machine, in fact, the VM will be returned to the operating system.

  • What are the return conditions?
    • GC does not return memory directly to the system
    • Memory is also managed by Block. When a block has not been accessed by GC for six consecutive times, the memory will be returned to the system. (The mono runtime can hardly see it, and the IL2cpp runtime may see a little more)
  • Memory is not allocated frequently, but in large chunks at a time.

GC mechanism

Unity's GC mechanism is Boehm memory recovery, which is non-generational and non-compressed. (The reason why Boehm is used is because of some historical reasons of Unity and Mono, and Unity is currently focusing on IL2CPP)

GC Mechanism Considerations

  • Throughput ((recycling capacity)
    • How much memory will be recovered in one recovery
  • Pause times
    • How much does it affect the main thread when recycling
  • Fragmentation
    • After the memory is reclaimed, how much will it contribute to the overall reclaimed memory pool?
  • Mutator overhead (additional consumption)
    • Recycling itself has overhead, so a lot of statistical and marking work needs to be done
  • Scalability (scalability)
    • Will there be bugs when expanding to multi-core and multi-threading?
  • Protability
    • Is it possible to use different platforms

BOEHM

  • Non-generational (not generational)

image.png

Generation means that large memory, small memory, and ultra-small memory are managed in different memory areas. There is also long-term memory. When a memory has not been moved for a long time, it will be moved to the long-term memory area, thereby saving memory for more frequently allocated memory.

  • Non-compacting

image.png

  • When memory is reclaimed, compressed memory will rearrange the empty space in the above picture.
  • But Unity's BOEHM will not! It is uncompressed. Leave it empty and fill it in when you need it next time.
    • Historical reasons: In the cooperation between Unity and Mono, Mono has not always been open source and free, so Unity chooses not to upgrade Mono, which has a gap with the actual Mono version.
    • next-generation GC
      • Incremental GC (progressive GC)
        • Now if we want to perform a GC, the main thread is forced to stop and traverse all GC Memory "islands" (inaudible) to determine which GCs can be recycled.
        • Incremental GC divides the task of suspending the main thread into frames. Analysis bit by bit, the main thread will not have a spike. The overall GC time remains the same, but the GC's stalling effect on the main thread will be improved.
      • SGen or upgrade Boehm?
        • SGen is generational, which can avoid memory fragmentation problems, mobilize strategies, and be faster
      • IL2CPP
        • Now the GC mechanism of IL2CPP is rewritten by Unity itself, which is an upgraded version of Boehm

Memory fragmentation memory fragmentation

In order to prevent memory fragmentation (Memory Fragmentation), when loading, resources with large memory should be loaded first, and then resources with small memory (because Bohem does not have memory compression), so as to ensure the maximum use of memory.

image.png

  • Why is the memory down, but the overall memory pool is still up?
    • Because the memory is so large, there is no room for it in the memory pool, although there is a lot of memory available. (memory is severely fragmented)
  • When developers load a large amount of small memory and use release *N, such as configuration tables and huge arrays, GC will increase a lot.
    • It is recommended to operate the large memory first, and then operate the small memory to ensure that the memory is reused with maximum efficiency.

Zombie Memory

  • It is wrong to talk about memory leaks. No one can manage the memory, but in fact the memory has not been leaked. It has been in the memory pool and dropped by zombies. This kind of memory is called Zombie memory.
  • useless content
    • There was a problem during coding or team cooperation, and a thing was loaded in, but it was only used once from beginning to end.
    • Some developers wrote a queue scheduling strategy, but the strategy was not well written, resulting in some things that he thought would be released, but not released.
    • Look for memory that is not actually active.
  • not released
  • View the references of each resource through code management and performance tool analysis

Best Practices

1. Use Destory instead of NULL.

2. Use more Structs.

3. Use a memory pool: VM itself has a memory pool, but it is recommended that developers build a memory pool for widgets that are frequently used. Such as UI, particle system, bullets, etc.

4. Closures and anonymous functions: reduce usage. All closures and anonymous functions will eventually become a Class.

5. Coroutine: As long as it is not released, all the memory referenced in it will exist. (Produce one when you use it and throw it away when you don't use it).

6. Configuration table: reduce the number of configuration tables for one-time use; do not throw in the entire configuration table, whether it can be divided into configuration tables.

7. Singleton: use it with caution, it will remain in memory from the start of the game until the game dies.

8. Unity performance analysis tool UPR: https://mp.weixin.qq.com/s/n0ERE93QQZ499Xz79eTqKA

postscript:

I suggest you Unity developers to watch the original video, you will gain more details~
https://www.bilibili.com/video/BV1aJ411t7N6

Guess you like

Origin blog.csdn.net/zhenghongzhi6/article/details/116787183