Fixing Performance Problems - 2019.3 (Performance Optimization - Unity 2019.3)Copy

Once you've identified a performance problem in your game, how do you go about fixing it?
Some common issues and optimization techniques for scripting, garbage collection, and graphics rendering.
1.Optimizing scripts in Unity games Optimizing scripts in unity

Introduction When our game is running, the device's central processing unit (CPU) executes instructions. Every frame of our game requires the execution of millions of CPU instructions. In order to maintain a smooth frame rate, the CPU must execute instructions within a specified amount of time. When the CPU cannot execute all instructions in a timely manner, our game may slow down, freeze or freeze. Many things can cause the CPU to have too much work to do. Examples might include demanding rendering code, overly complex physics simulations, or too many animation callbacks. This article only focuses on one of the causes: CPU performance issues caused by the code we wrote in the script. In this article, we'll learn how to translate scripts into CPU instructions, what can cause scripts to generate too much work for the CPU, and how to fix performance issues caused by code in scripts.

Diagnosing problems with our code Diagnosing problems with our code

Performance issues caused by excessive demands on the CPU can manifest as low frame rates, erratic performance, or intermittent freezes. However, other problems can cause similar symptoms. If our game has such performance issues, the first thing we must do is use Unity's profiler window to determine if our performance issues are due to the CPU not being able to complete its tasks in a timely manner. Once we've established this, we have to determine whether the userscript is causing the problem, or if the problem is caused by some other part of the game: complex physics or animations, for example. To learn how to use Unity's Profiler window to find the cause of performance problems, follow Diagnosing Performance Problems - 2019.3.

A brief introduction to how Unity builds and runs our game In order to understand why our code doesn't perform well, we first need to understand what happens when Unity builds our game. Knowing what's going on behind the scenes will help us make informed decisions about how to improve the performance of our game. The Build Process When we build our game, Unity packages everything needed to run the game into a program that can be executed by our target device. CPUs can only run code written in very simple languages, known as machine code or native code; they cannot run code written in more complex languages ​​such as C#. This means that Unity has to translate our code into other languages. This translation process is called compilation. Unity first compiles our script into a language called Common Intermediate Language (CIL). CIL is a language that can easily be compiled into a variety of different native code languages. The CIL is then compiled to native code for our specific target device. The second step happens when we build the game (so-called ahead-of-time compilation or AOT compilation), or on the target device itself, before the code runs (so-called just in time compilation or JIT compilation). Whether our game uses AOT or JIT compilation usually depends on the target hardware. (For example, Android is JIT and PC is AOT)

The relationship between the code we write and compiled code The code that has not been compiled is called source code. The source code we write determines the structure and content of the compiled code. In most cases, well-structured and efficient source code will produce well-structured and efficient compiled code. However, it is useful for us to know some native code so that we can better understand why some source code is compiled into more efficient native code. First, some CPU instructions take longer to execute than others. An example of this is computing square roots. Performing this calculation requires more CPU time than (for example) multiplying two numbers together. The difference between a fast CPU instruction and a slow CPU instruction is indeed very small, but it helps us understand that, fundamentally, some instructions are simply faster than others. The next thing we need to understand is that some operations that look very simple in source code can be very complicated when compiled into code. An example is inserting an element into a list. Doing this requires many more instructions than (for example) accessing elements from an array by index. Again, when we consider a single example, we're talking for a short time, but it's important to understand that some operations generate more instructions than others. Understanding these ideas will help us understand why some code performs better than others, even though the two examples do very similar things. Even a limited background understanding of how things work at a low level can help us write games that perform well. Run time communication between Unity Engine code and our script code It's useful for us to understand that scripts written in C# behave in a slightly different way than the code that makes up the Unity engine. Most of the core functionality of the Unity engine is written in C++ and has been compiled to native code. This compiled engine code is part of the installation when we install Unity. Code compiled for CIL, like our source code, is called managed code. When managed code is compiled to native code, it is integrated with a so-called managed runtime. The managed runtime takes care of things like automatic memory management and safety checks to ensure that errors in the code will cause exceptions rather than device crashes. The work of setting up these safety checks must be done when the CPU transitions between running engine code and managed code. When passing data from managed code back to engine code, the CPU may need to convert the data from the format used by the managed runtime to the format required by the engine code. This conversion is called marshaling. Again, the overhead from any single call between managed code and engine code isn't particularly expensive, but it's important that we understand that this overhead exists.
The causes of poorly-performing code

Now that we understand what happens to our code when Unity builds and runs our game, we can also understand that when our code performs poorly, it is because it is creating too much work for the CPU at runtime . Let's consider different reasons. The first possibility is that our code is just wasteful or poorly structured. An example of this might be code calling the same function repeatedly when it can only be called once. This article walks through several common examples of poor structure and gives example solutions. The second possibility is that our code appears to be well structured, but makes unnecessary expensive calls to other code. An example of this might be code that causes unnecessary calls between managed code and engine code. This article will give some examples of Unity API calls that can be surprisingly expensive, but suggest some more efficient alternatives. The next possibility is that our code is valid, but it is being called when it is not necessary. An example of this might be code that simulates enemy line of sight. The code itself may execute fine, but it would be wasteful to run this code when the player is far away from the enemy. This article contains technical examples that can help us write code that only runs when needed. The last possibility is that our code is too demanding. An example of this might be a very detailed simulation where a large number of agents use complex AI. If we've tried other possibilities and optimized this code as much as possible, then we might just have to redesign our game to make it less demanding: for example, pretending our simulated elements instead of calculating them. Implementing this kind of optimization is beyond the scope of this article, as it is very game dependent, but it would still be good for us to read this article and think about how to make our game as performant as possible.

Improving the performance of our code Improving the performance of our code

Once we've determined that performance issues in our game are due to our code, we have to think carefully about how to fix them. Optimizing a demanding function might seem like a good place to start, but the function in question may already be optimal and, by its nature, expensive. Instead of changing that functionality, there might be a small efficiency saving of hundreds of GameObjects we can use in one script, which gives us a much more useful performance increase. The meaning of the above paragraph is that a small efficiency saving can be made, this script is used by hundreds of game objects, then this will lead to a huge performance improvement.

Also, improving the CPU performance of your code may come at a price: changes may increase memory usage or offload work to the GPU. For these reasons, this article is not a simple set of steps. This article is a series of suggestions for improving the performance of your code and provides examples of applying them. As with all performance optimizations, there are no hard and fast rules. The most important thing is to analyze our game, understand the nature of the problem, try different solutions and measure the results of our changes

Writing efficient code Writing efficient code

Writing efficient code and organizing it properly can improve the performance of your game. While the examples shown are in the context of a Unity game, these general best practice recommendations are not specific to Unity projects or Unity API calls.

Move code out of loops when possible

Loops are a common source of inefficiency, especially when they are nested. If this code is found in a loop that runs very frequently, especially in many game objects in our game, the problem of inefficiency increases. In the simple example below, our code iterates through the loop each time Update() is called, regardless of whether the condition is met or not.

    void Update()
    {
        for(int i = 0; i < myArray.Length; i++)
        {
            if(exampleBool)
            {
                ExampleFunction(myArray[i]);
            }
        }
    }

With a simple change, the code will only iterate through the loop if the condition is met.

    void Update()
    {
        if(exampleBool)
        {
            for(int i = 0; i < myArray.Length; i++)
            {
                ExampleFunction(myArray[i]);
            }
        }
    }

This is a simple example, but it illustrates the real savings we can make. We should check for bad loop structures in the code. Consider if the code has to run every frame. Update() is a function that is run by Unity every frame. Update() is a convenient place to put code that needs to be called frequently, or that must respond to frequent changes. However, not all of this code needs to run every frame. Moving the code out of Update() so it only runs when needed is a great way to improve performance.

Only run code when things change Only run code when things change

Let's look at a very simple example of code optimized so that it is only run when things change. In the code below, DisplayScore() is called within Update(). However, the value of the score may not change from frame to frame. This means we are calling DisplayScore() unnecessarily.

    private int score;
     
    public void IncrementScore(int incrementBy)
    {
        score += incrementBy;
    }
     
    void Update()
    {
        DisplayScore(score);
    }

With one simple change, we now ensure that DisplayScore() is only called when the value of the score changes.

    private int score;
     
    public void IncrementScore(int incrementBy)
    {
        score += incrementBy;
        DisplayScore(score);
    }

Again, the examples above are intentionally simplified, but the principle is clear. If we apply this approach in our code, we may save CPU resources.

Run code every [x] frames Run code every X frames

If code needs to run frequently and cannot be triggered by an event, that doesn't mean it needs to run every frame. In these cases, we can choose to run the code every [x] frames. In this example code, an expensive function runs every frame.

    void Update()
    {
        ExampleExpensiveFunction();
    }

In fact, running the code every 3 frames is enough. In the code below, we've used the modulus operator to ensure that the expensive function is only run on every third frame.

    private int interval = 3;
     
    void Update()
    {
        if(Time.frameCount % interval == 0)
        {
            ExampleExpensiveFunction();
        }
    }

Another benefit of this technique is that it is easy to spread expensive code over different frames, avoiding spikes. In the example below, each function is called every 3 frames and never on the same frame.

    private int interval = 3;
     
    void Update()
    {
        if(Time.frameCount % interval == 0)
        {
            ExampleExpensiveFunction();
        }
        else if(Time.frameCount % 1 == 1)
        {
            AnotherExampleExpensiveFunction();
        }
    }

Use caching using caching mechanism

If our code repeatedly calls expensive functions that return results, and then discards those results, this could be an optimization opportunity. It may be more efficient to store and reuse references to these results. This technique is called caching. In Unity, it is common to call GetComponent() to access components. In the example below, we call GetComponent() in Update() to access a renderer component, which is then passed to another function. This code works, but it's pretty inefficient because of the repeated GetComponent() calls.

    void Update()
    {
        Renderer myRenderer = GetComponent<Renderer>();
        ExampleFunction(myRenderer);
    }

The above code only calls GetComponent() once, because the result of the function is cached. Cached results can be reused in Update() without further calls to GetComponent().

    private Renderer myRenderer;
     
    void Start()
    {
        myRenderer = GetComponent<Renderer>();
    }
     
    void Update()
    {
        ExampleFunction(myRenderer);
    }

We should check for instances in our code where functions that return results are called frequently. We can reduce the cost of these calls by using caching

Use the right data structure Use the right data structure

How we structure our data has a huge impact on how our code executes. There is no single data structure that fits all situations, so to get the best performance in the game we need to use the correct data structure for each task. In order to properly decide which data structure to use, we need to understand the pros and cons of different data structures and think carefully about what we want our code to do. We may have thousands of elements that need to be iterated every frame, or we may have a small number of elements that need to be added and removed frequently. These different problems are best solved by different data structures. Making the right decision here depends on our knowledge of the subject. If this is a new area of ​​knowledge, the best place to start is by learning Big O notation. Big O notation is the way to talk about the complexity of an algorithm, and understanding it will help us compare different data structures. This article is a clear and beginner-friendly guide to the subject. We can then learn more about the available data structures and compare them to find the right data solutions for different problems. This MSDN guide to collections and data structures in C# provides general guidance on choosing an appropriate data structure and provides links to more in-depth documentation. It's unlikely that a single choice about a data structure will have a big impact on our game. However, in a data-driven game with a large number of such sets, the results of these choices can actually add up. Understanding the complexity of algorithms and the pros and cons of different data structures will help us create well-performing code.

Minimize the impact of garbage collection Minimize the impact of garbage collection

Garbage collection is part of how Unity manages memory. The way our code uses memory determines the frequency and CPU cost of garbage collection, so it's important to understand how garbage collection works. In the next steps, we will discuss the topic of garbage collection in depth and provide several different strategies to minimize the impact of garbage collection.

  Use object pooling use object pool

Instantiating and destroying an object is usually more expensive than disabling and reactivating it. This is especially true if the object contains startup code, such as calling Awake() or GetComponent() in the Start() function. If we need to generate and process multiple copies of the same object, like bullets in a shooter, then we might benefit from object pooling. Object pooling is a technique where, instead of creating and destroying instances of objects, you temporarily deactivate objects and then recycle and reactivate them as needed. While object pooling is well known as a technique for managing memory usage, it can also be used as a technique for reducing excessive CPU usage. A complete guide to object pooling is beyond the scope of this article, but it's a very useful technique and well worth learning. This tutorial on object pooling on the Unity Learn site is a great guide to implementing an object pooling system in Unity.

Avoiding expensive calls to the Unity API Avoid expensive calls to the Unity API

Sometimes our code's calls to other functions or APIs can be unexpectedly expensive. There could be many reasons for this. What looks like a variable can actually be an accessor. It contains additional code, fires events, or calls engine code from managed code. In this section we'll look at some examples of Unity API calls that are more expensive than they appear. : We will consider how to reduce or avoid these costs. These examples illustrate different potential causes of cost, and the suggested solutions can be applied to other similar situations. It is important to understand that there is no list of Unity API calls that we should avoid. Every API call is useful in some cases and less so in others. In any case, we must carefully analyze our game, find out what is causing the expensive code, and think carefully about how to solve the problem in the way that is best for our game.

SendMessage() sends a message

SendMessage() and BroadcastMessage() are very flexible functions that require little knowledge of the project's structure and are very fast to implement. Therefore, these functions are very useful for prototyping or beginner-level scripting. However, they are very expensive to use. This is because these functions make use of reflection. Reflection is the term for code that inspects and makes decisions about itself at runtime, rather than at compile time. Code that uses reflection does more work for the CPU than code that doesn't. It is recommended to use SendMessage() and BroadcastMessage() only for prototyping and use other functions where possible. For example, if we know on which component we want to call a function, then we should directly reference that component and call the function that way. If we don't know on which component we want to call a function, we can consider using events or delegates. Find() Find() and related functions are powerful, but expensive. These functions require Unity to iterate over every GameObject and Component in memory. This means they are not particularly desirable in small, simple projects, but as projects grow in complexity, they become increasingly expensive to use. It's best to use Find() and similar functions infrequently, and to cache the results as much as possible. Some simple techniques can help us reduce the use of Find() in our code, including setting references to objects using the Inspector panel when possible, or creating scripts to manage references to frequently searched objects. transform

Setting the transform's position or rotation will cause the internal OnTransformChanged event to propagate to all children of that transform. This means that setting a transform's position and rotation values ​​is relatively expensive, especially in transforms with many children. To limit the number of these internal events, we should avoid setting the values ​​of these properties unnecessarily frequently. For example, we could perform a calculation to set the transform's x position, and then perform another calculation in Update() to set the transform's z position. In this case, we should consider copying the transform's position to a Vector3, performing the required calculations on that Vector3, and then setting the transform's position to the value of that Vector3. This just causes an OnTransformChanged event. transform. position is an example of an accessor that computes the result behind the scenes. This can be compared to Transform.localPosition. The value of localPosition is stored in transform and calling transform. localPosition just returns this value. However, every time transform.position is called, the transform's world position is calculated. If our code uses Transform.position a lot, then use transform. This will reduce the number of CPU instructions and ultimately improve performance. If we use Transform.position often, we should cache it as much as possible.

Update(), LateUpdate(), and other event functions look simple, but they have hidden overhead. These functions require communication between engine code and managed code each time they are called. Besides that, Unity performs some safety checks before calling these functions. Safety checks ensure that the GameObject is in a valid state, has not been destroyed, etc. This overhead is not particularly large for any single call, but for a game with tens of thousands of single actions, the overhead can add up. For this reason, empty Update() calls can be particularly wasteful. We can assume that since the function is empty and our code contains no direct calls to it, the empty function will not run. This is not the case: behind the scenes, these safety checks and native calls still happen even if the body of the Update() function is empty. To avoid wasting CPU time, we should ensure that the game does not contain empty Update() calls. If our game has many active single behaviors with Update() calls, we might benefit from building the code differently to reduce this overhead. This Unity blog post on this subject goes into much more detail on this topic.

Vector2 and Vector3 We know that some operations will simply generate more CPU instructions than others. Vector math operations are one such example: they're just more complex than floating point or integer operations. Although the difference in the actual time the two calculations take is small, over a large enough scale this operation can affect performance. It's common and convenient to do math with Unity's Vector2 and Vector3 structures, especially when dealing with transformations. If we do many frequent Vector2 and Vector3 math operations in our code, for example in nested loops in Update() of many GameObjects, we may create unnecessary work for the CPU. In these cases we can save performance by performing int or float calculations. Earlier in this article, we learned that the CPU instructions required to perform a square root calculation are slower than those required for a simple multiplication. Both Vector2.magnitude and Vector3.magnitude are examples of this, as they both involve square root calculations. Additionally, Vector2.Distance and Vector3.Distance use magnitude behind the scenes. If our game makes extensive and frequent use of magnitude or distance, then we may be able to avoid relatively expensive square root calculations by using Vector2.sqrMagnitude and Vector3.sqrMagnitude. Again, replacing a single call makes only a small difference, but at a large enough scale, there may be useful performance savings. Camera.main Camera.main is a convenience Unity API call that returns a reference to the first enabled camera component, which is labeled "main Camera". Here's another example of what looks like a variable but is actually an appendage. In this case, the accessor calls an internal function, similar to Find() behind the scenes. Camera.main, therefore, suffers from the same problem as Find(): It searches all game objects and components in memory and is very expensive to use. To avoid this potentially expensive call, we should cache the result of Camera.main. Master or avoid its use entirely and manually manage reference to our camera.

Other Unity API calls and further optimizations We have considered some examples of common Unity API calls that can be unexpectedly expensive and learned about the different reasons behind this cost. However, this is not an exhaustive list of ways to improve the efficiency of Unity API calls. This article on performance in Unity is an extensive guide on Unity optimizations, and it includes many other Unity API optimizations that we may find useful. Also, that article discusses further optimizations in depth, which is beyond the scope of this relatively advanced introductory article.

Running code only when it needs to run

There is a saying in programming: "The fastest code is the code that doesn't run". Often, the most effective way to solve a performance problem is not to use an advanced technique: it's to simply remove code that wasn't needed in the first place. Let's look at a few examples to see where we can make this savings. Culling

Unity includes code to check if an object is within the camera's frustum. Code related to rendering these objects will not run if they are not in the camera's frustum. This term is called frustum culling. We can take a similar approach to code in scripts. If we have a code related to the visible state of the object, we probably don't need to execute this code when the object is not visible to the player. In complex scenes with many objects, this can lead to considerable performance savings. In the simplified example code below, we have an example of patrolling enemies. Every time Update() is called, the script controlling this enemy will call two example functions: one related to moving the enemy, and one related to its visual state.

    void Update()
    {
        UpdateTransformPosition();
        UpdateAnimations();
    }

In the code below, we now check if the enemy's renderer is within any camera's frustum. Code related to the enemy's visibility state only runs when the enemy is visible.

    private Renderer myRenderer;
     
    void Start()
    {
        myRenderer = GetComponent<Renderer>();
    }
     
    void Update()
    {
        UpdateTransformPosition();
     
        if (myRenderer.isVisible)
        {
            UpateAnimations();
        }
    }

Disabling codes without the player seeing them can be achieved in several ways. If we know that certain objects in the scene are not visible at certain points in the game, we can disable them manually. When we're not sure and need to calculate visibility, we can use a rough calculation (for example, to check for objects behind the player), functions like OnBecameInvisible() and OnBecameVisible(), or a more detailed raycast. The best implementation depends heavily on our game, and experimentation and analysis are essential.

Level of detail level of detail

Level of Detail, also known as LOD, is another common rendering optimization technique. Objects closest to the player are rendered with full fidelity using detailed meshes and textures. Distant objects use less detailed meshes and textures. Our code can also use a similar approach. For example, we might have an enemy whose AI script dictates its behavior. Part of this behavior can involve expensive operations to determine what it can see and hear, and how it should react to such input. We can use a detailed system to enable or disable these expensive operations based on the distance of the enemy from the player. In a scene with many enemies, we can save quite a bit of performance if the closest enemy is performing the most expensive operation. Unity's CullingGroup API allows us to connect to Unity's LOD system to optimize our code. The man page for the CullingGroup API contains some examples of how to use it in our game. As always, we should test, analyze and find a suitable solution for our game. We've seen what happens to the code we write when a Unity game is built and run, why our code is causing performance issues, and how to minimize the impact of game overhead. We've looked at some common causes of performance issues in our code and considered some different solutions. Using this knowledge and our profiling tools, we should now be able to diagnose, understand, and fix code-related performance issues in our games.
2.Optimizing garbage collection in Unity games Optimizing garbage collection

When our game is running, it uses memory to store data. When this data is no longer needed, the memory that stored it is freed so it can be reused. "Garbage" refers to memory that has been used to store data but is no longer used. Garbage collection is the name of the process that makes that memory available again for reuse.

Unity uses garbage collection as part of its memory management. If garbage collection happens too often or if there is too much work to do, our game may perform poorly, which means that garbage collection is a common cause of performance problems. In this article, we'll learn how garbage collection works, when it happens, and how to use memory efficiently to minimize the impact of garbage collection on your game

Diagnosing problems with garbage collection Diagnosing problems with garbage collection

Performance issues caused by garbage collection can manifest as low frame rates, erratic performance, or intermittent freezes. However, other problems can cause similar symptoms. If our game has performance issues like this, the first thing we should do is use Unity's Profiler window to determine if the issues we're seeing are actually due to garbage collection. To learn how to use the Profiler window to find the cause of performance problems, follow this tutorial.

A brief introduction to memory management in Unity A brief introduction to memory management in Unity

In order to understand how garbage collection works, and when garbage collection occurs, we must first understand how memory usage works in Unity. First, we have to understand that Unity runs its own core engine code differently than it runs code we write in scripts.

When Unity runs its own core Unity Engine code, the way Unity manages memory is known as manual memory management. This means that the core engine code must explicitly declare how memory is used. Manual memory management does not use garbage collection and will not be discussed in depth in this article. The way Unity manages memory when running our code is called automatic memory management. This means that our code does not need to explicitly tell Unity how to manage memory in detail. Unity solves this problem for us. At the most basic level, Unity's automatic memory management works like this:

1. Unity has access to two memory pools: the stack and the heap (also known as the managed heap). Stack is used for short-term storage of small pieces of data while heap is used for long-term storage and large pieces of data.

2. When a variable is created, Unity will request a block of memory from the stack or heap.

3. As long as the variable is in scope (our code can still access it), the memory allocated to it is still in use. We say that this memory has been allocated. We describe variables in stack memory as objects on the stack, and variables in heap memory as objects on the heap.

4. When a variable goes out of scope, the memory is no longer needed and it can be returned to the pool it was originally in. When memory is returned to its pool, we say that memory has been freed. Once the referenced variable goes out of scope, the memory on the stack is freed. However, the memory in the heap is not freed at this point and remains allocated even though the variable it refers to goes out of scope.

5. The garbage collector identifies and releases unused heap memory. The garbage collector runs periodically to clean up the heap. Now that we understand the flow of events, let's take a closer look at stack allocation and deallocation, heap allocation and deallocation

What happens during stack allocation and deallocation? Stack allocation and deallocation are fast and simple. This is because the stack is only used to store small data for a short period of time. Allocations and deallocations always happen in a predictable order and are of predictable size.

A stack works like a stack data type: it is a simple collection of elements (blocks of memory in this case) where elements can only be added and removed in a strict order. This simplicity and rigor is what makes it so fast: when a variable is stored on the stack, its memory is only allocated from the "end" of the stack. When a stack variable goes out of scope, the memory used to store that variable is immediately returned to the stack for reuse.

What happens during a heap allocation? What happens during heap allocation?

Heap allocation is much more complicated than stack allocation. This is because the heap can be used to store long-term and short-term data, and many different types and sizes of data. Allocation and deallocation do not always happen in a predictable order, and memory blocks of very different sizes may be required. When creating a heap variable, the following steps need to be performed:

1. Unity has to check if there is enough free memory in the heap. If there is enough free memory in the heap, the variable's memory is allocated.

2. If there is not enough free memory in the heap, Unity will trigger the garbage collector to release unused heap memory. This can be a slow operation. If there is now enough free memory in the heap, the variable's memory is allocated.

3. If there is not enough free memory in the heap after garbage collection, Unity will increase the memory in the heap. This can be a slow operation. Then allocate memory for the variable. Heap allocation can be slow, especially if the garbage collector has to be run and the heap has to be extended.

What happens during garbage collection? What happens during garbage collection?

When a heap variable goes out of scope, the memory used to store it is not released immediately. Unused heap memory is only freed when the garbage collector runs. Every time the garbage collector runs, the following steps are performed:

1. The garbage collector examines every object on the heap.

2. The garbage collector will search all current object references to determine if the object on the heap is still in scope.

3. Any objects that are not in scope are marked for deletion.

4. The marked objects are deleted and the memory allocated to them is returned to the heap. Garbage collection can be an expensive operation. The more objects on the heap, the more work it has to do, and the more object references in your code, the more work it has to do.

When does garbage collection happen? When does garbage collection happen?

There are three situations that will cause the garbage collector to run:

1. Whenever a heap allocation is requested, if it cannot be done using free memory in the heap, the garbage collector runs.

2. The garbage collector runs automatically from time to time (although the frequency varies by platform).

3. The garbage collector can be forced to run manually.

Garbage collection can be a frequent operation. The garbage collector is triggered when available heap memory cannot complete heap allocations, which means frequent heap allocations and reclaims can lead to frequent garbage collections.

Problems with garbage collection

Now that we understand the role garbage collection plays in Unity's memory management, we can consider the types of problems that can arise. The most obvious problem is that the garbage collector can take a considerable amount of time to run. If the garbage collector has a lot of objects on the heap and/or needs to check a lot of object references, the process of checking all of them can be slow. This can cause our game to stutter or run slowly.

Another problem is that the garbage collector may run at inconvenient times. If the CPU is already hard at work in performance-critical parts of the game, even a small amount of additional overhead from garbage collection can cause framerate drops and performance to change significantly. Another less obvious problem is heap fragmentation. When memory is allocated from the heap, it is allocated from free space in chunks of different sizes depending on the size of the data that must be stored. When these blocks of memory are returned to the heap, the heap can be split into many smaller blocks separated by allocated blocks. This means that although the total amount of free memory may be high, we cannot allocate large blocks of memory without running the garbage collector and/or expanding the heap because none of the existing blocks are large enough. The debris pile has two consequences. The first is that our game's memory usage will be higher than it needs to be, and the second is that the garbage collector will run more often. For a more detailed discussion of heap fragmentation, see this Unity best practice guide on performance.

Finding heap allocations If we know that garbage collection is causing problems in our game, we need to know which parts of the code are generating garbage. Garbage is generated when a heap variable goes out of scope, so, first, we need to know what causes a variable to be allocated on the heap.

What is allocated on the stack and the heap? What is allocated on the stack and the heap?

In Unity, local variables of value types are allocated on the stack, and everything else is allocated on the heap. The code below is an example of stack allocation because the variable localInt is of type local and value type. The memory allocated for this variable will be freed from the stack immediately after the function finishes running.

    void ExampleFunction()
    {
        int localInt = 5;
    }

The following code is an example of heap allocation, because the variable localList is local, but of reference type. The memory allocated for this variable will be freed when the garbage collector runs.

    void ExampleFunction()
    {
        List localList = new List();
    }

Using the Profiler window to find heap allocations Use the profile window to find heap allocations

We can use the Profiler window to see where the code creates heap allocations. You can access this window by going to Window > Analysis > Analyzer (Fig.01).

With the CPU usage profiler selected, we can select any frame at the bottom of the profiler window to view CPU usage data about that frame. One of the columns of data is called GC allocations. This column shows the heap allocations made in that frame. If we select the column headers, we can sort the data by this stat, making it easy to see which functions in the game are causing the most heap allocations. Once we know which function caused a heap allocation, we can inspect that function. Once we know what code in a function causes garbage to be generated, we can decide how to fix this and minimize the amount of garbage generated.

  Reducing the impact of garbage collection Reducing the impact of garbage collection

Generally speaking, we can reduce the impact of garbage collection on the game in the following three ways:

1. We can reduce the time it takes for the garbage collector to run

2. We can reduce how often the garbage collector runs.

3. We can intentionally trigger the garbage collector to run at non-performance critical times, such as during loading screens. With that in mind, here are three strategies that can help:

1. We can organize our game so that we have fewer heap allocations and fewer object references. There are fewer objects on the heap and fewer references to check, which means that when garbage collection is triggered, it takes less time to run.

2. We can reduce the frequency of heap allocation and release, especially at performance-critical moments. Fewer allocations and collections means fewer chances to trigger garbage collection. This also reduces the risk of heap fragmentation.

3. We can try to time garbage collection and heap expansion so that they occur at predictable and convenient times. This is a more difficult and less reliable approach, but can reduce the impact of garbage collection when used as part of an overall memory management strategy.

Reducing the amount of garbage created

Caching

If our code repeatedly calls functions that cause heap allocations, and then discards the results, this will create unnecessary garbage. Instead, we should store references to these objects and reuse them. This technique is called caching. In the example below, every call to the code results in a heap allocation. This is because a new array is created.

    void OnTriggerEnter(Collider other)
    {
        Renderer[] allRenderers = FindObjectsOfType<Renderer>();
        ExampleFunction(allRenderers);
    }

The code below only results in one heap allocation, because the array is created and filled only once, and then cached. Cached arrays can be used over and over again without generating more garbage.

    private Renderer[] allRenderers;
     
    void Start()
    {
        allRenderers = FindObjectsOfType<Renderer>();
    }
     
     
    void OnTriggerEnter(Collider other)
    {
        ExampleFunction(allRenderers);
    }

Don't allocate in functions that are called frequently Don't allocate in functions that are called frequently

If you must allocate heap memory in a MonoBehaviour, the worst case is in a frequently run function. For example, Update() and LateUpdate() are called every frame, so if our code generates garbage here, it can add up quickly. We should consider caching references to objects in Start() or Awake() where possible, or make sure the code that caused the allocation is only run when needed.

Let's look at an example of very simple mobile code so that it only runs when things change. In the code below, every call to Update() calls a function that causes an allocation, frequently creating garbage:

    void Update()
    {
        ExampleGarbageGeneratingFunction(transform.position.x);
    }

With one simple change, we can now ensure that the assign function is only called when the value of transform.position changes. x has changed. We now only do heap allocations when necessary, rather than every frame.

    private float previousTransformPositionX;
     
    void Update()
    {
        float transformPositionX = transform.position.x;
        if (transformPositionX != previousTransformPositionX)
        {
            ExampleGarbageGeneratingFunction(transformPositionX);
            previousTransformPositionX = transformPositionX;
        }
    }

Another technique to reduce the garbage generated in Update() is to use timers. This applies to code that generates garbage and must run periodically, but not necessarily every frame. In the sample code below, the garbage-generating function runs every frame:

    void Update()
    {
        ExampleGarbageGeneratingFunction();
    }

In the code below, we use a timer to ensure that the garbage-generating function runs every second

    private float timeSinceLastCalled;
     
    private float delay = 1f;
     
    void Update()
    {
        timeSinceLastCalled += Time.deltaTime;
        if (timeSinceLastCalled > delay)
        {
            ExampleGarbageGeneratingFunction();
            timeSinceLastCalled = 0f;
        }
    }

When making such small changes to frequently run code, it can drastically reduce the amount of garbage generated.

Clearing collections Clearing collections

Creating a new collection results in an allocation on the heap. If we find that new collections are being created more than once in our code, we should cache references to collections and use Clear() to empty their contents instead of calling new repeatedly. In the example below, a new heap allocation occurs every time new is used.

    void Update()
    {
        List myList = new List();
        PopulateList(myList);
    }

In the example below, allocations are only made when the collection is created or when the collection must be resized in the background. This greatly reduces the amount of garbage generated.

    private List myList = new List();
     
    void Update()
    {
        myList.Clear();
        PopulateList(myList);
    }

Object pooling object pool

Even if we reduce the allocation in the script, if we create and destroy a lot of objects at runtime, we will still have garbage collection problems. Object pooling is a technique for reducing allocation and recycling by reusing objects instead of repeatedly creating and destroying them. Object pools are used extensively in games, and are best suited for situations where we frequently spawn and destroy similar objects; for example, when shooting bullets from a gun. A full guide to object pooling is beyond the scope of this article, but it is a very useful technology is worth learning. Object pooling for this tutorial is at https://learn.unity.com/tutorial/introduction-to-object-pooling-2019-3?language=en   

Common causes of unnecessary heap allocations We know that local, value-type variables are allocated on the stack, while everything else is allocated on the heap. However, there are many situations where heap allocations can surprise us. Let's look at some common causes of unnecessary heap allocations and consider how best to reduce them.

Strings Strings

In C#, strings are reference types, not value types, even though they seem to contain the "value" of the string. This means that creating and discarding strings produces garbage. Since strings are commonly used in a lot of code, this kind of garbage is really going to grow. Strings in C# are also immutable, which means their value cannot change after they are first created. Every time we manipulate a string (for example, by concatenating two strings using the + operator), Unity creates a new string and discards the old string with the updated value. This creates garbage. We can follow some simple rules to minimize garbage in our strings. Let's consider these rules and see an example of how to apply them. 1. We should reduce unnecessary string creation. If we use the same string value multiple times, we should create the string once and cache the value.

2. We should reduce unnecessary string operations. For example, if we have a text component that is updated frequently and contains a concatenated string, then we might consider splitting it into two text components.

3. If we have to build strings at runtime, we should use the StringBuilder class. The StringBuilder class is designed for building strings without allocation, and it will save the amount of garbage generated when concatenating complex strings.

4. If we have to build strings at runtime, we should use the StringBuilder class. The StringBuilder class is designed for building strings without allocation, and it will save the amount of garbage generated when concatenating complex strings. Let's examine a code example that generates unnecessary garbage through inefficient use of strings. In the code below, we create a string for the score displayed in Update() by combining the string "TIME:" with the value of the float timer. This generates unnecessary garbage.

    public Text timerText;
    private float timer;
     
    void Update()
    {
        timer += Time.deltaTime;
        timerText.text = "TIME:" + timer.ToString();
    }

In the example below, we've improved things considerably. We put the word "TIME:" in a separate text component and set its value in Start(). This means that in Update(), we no longer need to combine strings. This greatly reduces the amount of garbage generated.

    public Text timerHeaderText;
    public Text timerValueText;
    private float timer;
     
    void Start()
    {
        timerHeaderText.text = "TIME:";
    }
     
    void Update()
    {
        timerValueText.text = timer.toString();
    }

Unity function calls Unity function calls

It's important to note that we can generate garbage when we call code we didn't write ourselves, whether in Unity itself or in plugins. Some Unity function calls create heap allocations, so this should be used sparingly to avoid generating unnecessary garbage. There is no list of functions we should avoid. Each function is useful in some situations and less so in others. As ever, it's best to carefully analyze our game, identify where the garbage is being generated, and think carefully about what to do with it. In some cases, it might be wise to cache the result of a function; in other cases, it might be wise to call the function less often; in other cases, it might be wise to refactor the code to use a different function. Having said that, let's look at a few common examples of Unity functions that cause heap allocations, and consider how best to deal with them. Every time we access a Unity function that returns an array, a new array is created and passed to us as the return value. This behavior is not always obvious or expected, especially when the function is an accessor (e.g. Mesh.normals ). The accessor is get

In the code below, a new array is created for each iteration of the loop.

    void ExampleFunction()
    {
        for (int i = 0; i < myMesh.normals.Length; i++)
        {
            Vector3 normal = myMesh.normals[i];
        }
    }

3.Optimizing garbage collection in Unity games

It's easy to reduce allocations in this case: we can simply cache references to the array. When we do this, only one array is created, and the amount of garbage created is correspondingly reduced. The code below demonstrates this. In this case we call it a grid. Do the normal before the loop runs and cache the references so that only one array is created.

    void ExampleFunction()
    {
        Vector3[] meshNormals = myMesh.normals;
        for (int i = 0; i < meshNormals.Length; i++)
        {
            Vector3 normal = meshNormals[i];
        }
    }

Another unexpected cause of heap allocation can be found in the function GameObject.name or GameObject.tag. Both of these accessors return new strings, which means calling these functions will generate garbage. Caching values ​​can be useful, but in this case we can use a related Unity function. In order to check the value of GameObject.tag() without generating garbage, we can use GameObject.comparetag(). In the sample code below, garbage is created by calling GameObject.tag:

    private string playerTag = "Player";
     
    void OnTriggerEnter(Collider other)
    {
        bool isPlayer = other.gameObject.tag == playerTag;
    }

If we use GameObject.CompareTag(), this function will no longer generate any garbage:

    private string playerTag = "Player";
     
    void OnTriggerEnter(Collider other)
    {
        bool isPlayer = other.gameObject.CompareTag(playerTag);
    }

GameObject.CompareTag is not unique; many Unity function calls have alternate versions that do not cause heap allocations. For example, we could use Input.GetTouch() and Input.touchCount in place of Input.touches, or Physics.SphereCastNonAlloc() in place of Physics.SphereCastAll().

Boxing

Boxing refers to what happens when a value type variable is used instead of a reference type variable. Boxing usually happens when we pass a variable of value type (such as int or float ) to a function that takes an object parameter (such as object.equals() ). For example, the function string.format() takes a string and an object argument. When we pass it a string and an integer, the integer must be boxed. Therefore, the following code contains an example of boxing:

    void ExampleFunction()
    {
        int cost = 5;
        string displayString = String.Format("Price: {0} gold", cost);
    }

Boxing produces garbage because of what's going on behind the scenes. When a value type variable is boxed, Unity creates a temporary system. Objects wrap variables of value types. A System.Object is a reference type variable, so when this temporary object is freed, garbage is generated. Boxing is a very common cause of unnecessary heap allocations. Even if we don't encapsulate variables directly in code, we may use plugins that cause boxing, or it may happen in the background of other functions. Best practice is to avoid boxing where possible, and remove any function calls that cause boxing.

Coroutines

Calling StartCoroutine() produces a small amount of garbage because Unity has to create an instance of the class to manage the coroutine. With this in mind, calls to StartCoroutine() should be limited, and while our game is interactive, performance is an issue. To reduce the garbage created in this way, any coroutines that must run at performance-critical moments should be started early, and special care should be taken when using nested coroutines, which may contain calls to StartCoroutine() delayed call. A yield statement in a coroutine does not itself create a heap allocation; however, the values ​​we pass in the yield statement may create unnecessary heap allocations. For example, the following code creates garbage:

yield return 0;

This code creates garbage because an int with value 0 is boxed. In this case, if we wish to simply wait a frame without causing any heap allocations, the best way is to use the following code:

yield return null;

 

Another common mistake with coroutines is using new when yielding the same value more than once. For example, the following code will create and release a WaitForSeconds object on each loop iteration:

    while (!isComplete)
    {
        yield return new WaitForSeconds(1f);
    }

If we cached and reused WaitForSeconds objects, much less garbage would be created. The code below shows an example

    WaitForSeconds delay = new WaitForSeconds(1f);
     
    while (!isComplete)
    {
        yield return delay;
    }

If our code is generating a lot of garbage due to coroutines, we might consider refactoring our code to use something other than coroutines. Refactoring code is a complex topic and every project is unique, but there are some common alternatives to coroutines that we might want to keep in mind. For example, if we primarily use coroutines to manage time, we may wish to simply keep track of time in the Update() function. If we're primarily using coroutines to control the order in which events happen in our game, we'll probably want to create some sort of messaging system to allow objects to communicate. There's no one-size-fits-all approach to this problem, but it's useful to remember that there are often multiple ways to achieve the same thing in code.

Foreach loops In versions prior to Unity 5.5, a foreach loop iterated anywhere other than an array, generating garbage each time the loop terminated. This is due to the boxing happening behind the scenes. A System.Object is allocated on the heap at the start of the loop and freed at the end of the loop. This issue is fixed in Unity 5.5. For example, in versions prior to Unity 5.5, the loop in the following code would generate garbage:

    void ExampleFunction(List listOfInts)
    {
        foreach (int currentInt in listOfInts)
        {
                DoSomething(currentInt);
        }
    }

As long as you have Unity 2019.3 you are safe, but if we cannot upgrade our Unity version, there is an easy solution. for and while loops do not cause background boxing, so no garbage is generated. When iterating over collections that are not arrays, we should support their use. The loop in the following code does not generate garbage:

    void ExampleFunction(List listOfInts)
    {
        for (int i = 0; i < listOfInts.Count; i ++)
        {
            int currentInt = listOfInts[i];
            DoSomething(currentInt);
        }
    }

Function references Function references

References to functions, whether they refer to anonymous methods or named methods, are reference-type variables in Unity. They will result in a heap allocation. Converting an anonymous method to a closure (where the anonymous method has access to variables in scope when it is created) can significantly increase memory usage and the number of heap allocations. The exact details of how function references and closures allocate memory depends on platform and compiler settings, but if garbage collection is an issue, it's best to minimize the use of function references and closures during gameplay. https://docs.unity3d.com/Manual/BestPracticeUnderstandingPerformanceInUnity4-1.html?_ga=2.173677475.1457923406.1588487957-163088388.1588307012 will cover the technical details of this topic in more detail

LINQ and Regular Expressions LINQ and Regular Expressions

Both LINQ and regular expressions produce garbage because boxing happens behind the scenes. Best practice is to avoid them where performance is a concern.

Structuring our code to minimize the impact of garbage collection

Structure your code to minimize the impact of garbage collection

The way our code is structured can affect garbage collection. Even if our code doesn't create heap allocations, it increases the garbage collector's workload. One way our code can unnecessarily increase the garbage collector's workload is by asking it to check things it shouldn't. Structs are variables of value type, but if our struct contains variables of reference type, then the garbage collector has to examine the entire struct. This creates a lot of extra work for the garbage collector if we have a large number of these structures. In this case, the struct contains a string, which is a reference type. The garbage collector must now examine the entire array of structures at runtime.

    public struct ItemData
    {
        public string name;
        public int cost;
        public Vector3 position;
    }
    private ItemData[] itemData;

In this example, we store the data in separate arrays. When the garbage collector runs, it only needs to examine the array of strings and can ignore other arrays. This reduces the work the garbage collector has to do

    private string[] itemNames;
    private int[] itemCosts;
    private Vector3[] itemPositions;

Another way our code unnecessarily increases the garbage collector's workload is by using unnecessary object references. When the garbage collector searches the heap for references to objects, it must examine every current object reference in the code. Fewer object references in code means less work to do, even if we don't reduce the total number of objects on the heap. In this example we have a class that populates a dialog. When the user views the dialog, another dialog is displayed. Our code contains a reference to the next instance of DialogData that should be displayed, which means that the garbage collector has to check this reference in its operation:

    public class DialogData
    {
        private DialogData nextDialog;
     
        public DialogData GetNextDialog()
        {
            return nextDialog;
        }
    }

Here, we've restructured the code so that it returns an identifier that is used to find the next DialogData instance, rather than the instance itself. This is not an object reference, so it will not increase the time spent by the garbage collector.

    public class DialogData
    {
        private int nextDialogID;
     
        public int GetNextDialogID()
        {
            return nextDialogID;
        }
    }

On its own, this example is fairly simple. However, if our game contains a large number of objects that contain references to other objects, we can greatly reduce the complexity of the heap by restructuring the code in this way.

Timing garbage collection Garbage collection time

Manually forcing garbage collection Manually forcing garbage collection

Finally, we may wish to trigger garbage collection ourselves. If we know that heap memory was allocated, but is no longer used (eg, our code generates garbage while loading assets) and we know that garbage collection freezes will not affect the player (eg, while the loading screen is still displayed) we can use the following code to request garbage Recycle:

System.GC.Collect();

This will force the garbage collector to run, freeing unused memory at our convenience. We've learned how garbage collection works in Unity, why it can cause performance issues, and how to minimize the impact of garbage collection on your game. Using this knowledge and our profiling tools, we can fix performance issues related to garbage collection and build our games so that they manage memory efficiently.
4.Optimizing graphics rendering in Unity games optimize graphics rendering in unity

  Introduction In this article, we will learn what happens when Unity renders a frame, what kind of performance issues can occur while rendering, and how to fix performance issues related to rendering. Before reading this article, it's important to note that there is no one-size-fits-all approach to improving rendering performance. Rendering performance is affected by many factors within the game and is highly dependent on the hardware and operating system the game is running on. The most important thing to remember is that we solve performance problems by investigating, experimenting, and rigorously analyzing the results of experiments. This article contains information about the most common rendering performance issues, along with advice on how to fix them and links to further reading. Our game may have a problem - or set of problems - not covered here. However, the article will still help us understand our problems and give us the knowledge and vocabulary to search for solutions effectively.

A brief introduction to rendering A brief introduction to rendering

Before we get started, let's take a quick look at what happens when Unity renders a frame. Understanding the correct terminology of event streams and things will help us understand, investigate and solve performance problems. At the most basic level, rendering can be described as follows:

1. The central processing unit, the CPU, figures out what has to be drawn and how.

2. The CPU sends instructions to the Graphics Processing Unit (GPU)

3. The GPU draws graphics according to the instructions of the CPU. Now let's take a closer look at what happened. We'll cover these steps in more detail later in this article, but for now, let's familiarize ourselves with the words used and understand the different roles that the CPU and GPU play in rendering. A phrase often used to describe rendering is the rendering pipeline, which is a useful image to remember; efficient rendering is about keeping information flowing.

1. For each frame rendered, the CPU does the following: The CPU examines each object in the scene to determine whether it should be rendered. Only objects that meet certain conditions are rendered; for example, some part of its bounding box must lie within the camera's viewing frustum. Objects that are not rendered are said to be culled. See this page for more information on frustums and frustum culling. this page.

2. The CPU collects information about each object that is going to be rendered and sequences this data into commands called draw calls. A draw call contains data about a single mesh and how that mesh should be rendered; for example, which textures should be used. In some cases, objects that share settings can be combined into the same draw call. Combining data from different objects into the same draw call is called batching

3. The CPU creates a packet called a batch for each draw call. Batches may sometimes contain data outside of the call, but these cases are unlikely to cause common performance problems, so they are not considered in this article. For each batch containing a draw call, the CPU must now do the following:

1. The CPU can send a command to the GPU to change some variables, which are collectively called the rendering state. This command is called a SetPass call. A SetPass call tells the GPU which settings to use for rendering the next mesh. The SetPass call is sent only when the next grid to be rendered requires a change to the rendering state of the previous grid

2. The CPU sends the draw call to the GPU. The draw call instructs the GPU to render the specified mesh using the settings defined in the most recent SetPass call.

3. In some cases, more than one batch may be required. A pass is a piece of shader code, and a new pass needs to change the rendering state. For each pass in the batch, the CPU must issue a new SetPass call, and then must issue the draw call again. At the same time, the GPU does the following work:

1. The GPU processes tasks from the CPU in the order they are sent

2. If the current task is a SetPass call, the GPU will update the rendering state

3. If the current task is a draw call, the GPU renders the mesh. This happens in stages, defined by separate sections of shader code. This part of the rendering is complex and we won't go into detail about it, but it helps to understand that this piece of code called the vertex shader tells the GPU what to do with the vertices of the mesh, and then a piece of code called the fragment shader tells the GPU how Draw each pixel.

4. This process is repeated until all tasks sent from the CPU are processed by the GPU. Now that we've seen what happens when Unity renders a frame, let's consider what can go wrong while rendering.

Types of rendering problems Types of rendering problems

The most important thing about rendering is this: in order to render a frame, both the CPU and the GPU have to do all of their work. If any of these tasks take too long to complete, it will cause a delay in the rendering of the frame. There are two root causes of rendering issues. The first type of problem is caused by inefficient plumbing. An inefficient pipeline occurs when one or more steps in the rendering pipeline take too long to complete, interrupting the smooth flow of data. Inefficiencies within the pipeline are known as bottlenecks. The second type of problem is caused by trying to push too much data through the pipe. Even the most efficient pipelines have a limit to the amount of data they can process in one frame.

When our game takes too long to render a frame because the CPU takes too long to perform its rendering tasks, our game is a so-called CPU bottleneck

When our game takes too long to render a frame because the GUP takes too long to perform its rendering tasks, our game is the so-called GPU bottleneck

Understanding rendering problems Understanding rendering problems

Before making any changes, it is very important to use profiling tools to understand the cause of performance problems. Different problems require different solutions. It is also very important to measure the impact of every change we make; fixing performance issues is a balancing act and improving one aspect of performance can negatively affect another. We will use two tools to help us understand and Fix rendering performance issues: Profiler window and frame debugger. Both tools are built into Unity.

The Profiler window The Profiler window

The Profiler window allows us to view real-time data about how the game is performing. We can use the Profiler window to view data on many aspects of the game, including memory usage, rendering pipeline, and user script performance. This page of the Unity Manual is a good introduction.

The Frame Debugger frame debugger

The frame debugger allows us to see step by step how a frame is rendered. Using the frame debugger, we can see detailed information like what is being drawn during each draw call, the shader attributes for each draw call and the order of events sent to the GPU. This information helps us understand how the game is rendered and where we can improve performance. If you are not yet familiar with using the Frame Debugger, this page of the Unity Manual is a very useful guide to what it does and this tutorial video shows it in use.

Finding the cause of performance problems Find the cause of performance problems

Before we try to improve the rendering performance of our game, we must determine that our game is running slowly due to rendering issues. There's no point in trying to optimize our rendering performance if the real cause of the problem is an overly complex userscript! Once we've established that our problem is with rendering, we also have to understand whether our game is CPU bound or GPU bound. These different problems require different solutions, so it is crucial to understand the cause of the problem before attempting to fix it. If you are still not sure whether your game is CPU bound or GPU bound, you should follow this tutorial. If we are sure that our problem is related to rendering, and we know whether our game is CPU bound or GPU bound, we will It's time to read on.

If our game is CPU bound Generally speaking, the work that the CPU must do to render a frame falls into three categories:

1. Decide what to draw

2. Prepare command for GPU

3. Send commands to the GPU

These broad categories contain many individual tasks that can execute across multiple threads. Threads allow independent tasks to occur concurrently; while one thread is performing one task, another thread can perform a completely independent task. This means work can be done faster.

When rendering tasks are split into different threads, this is called multi-threaded rendering. There are three types of threads in Unity's rendering process: main thread, rendering thread, and worker thread. The main thread is where most of the CPU work in our game happens, including some rendering tasks. The rendering thread is a thread dedicated to sending commands to the GPU. Each worker thread performs a single task, such as filtering or mesh skinning. Which tasks are performed by which thread depends on the game's settings and the hardware the game is running on. For example, the more CPU cores our target hardware has, the more worker threads it can spawn. Therefore, it is very important to present our game on the target hardware; our game may perform very differently on different devices. Because multi-threaded rendering is complex and hardware-dependent, we must understand which tasks are causing our game to be CPU-bound before we attempt to improve performance. If our game is running slowly because culling operations take too long on one thread, then it won't help us reduce the time it takes to send commands to the GPU on a different thread. Note: Not all platforms support multi-threaded rendering; at the time of this writing, WebGL does not support this feature. On platforms that do not support multi-threaded rendering, all CPU tasks are performed on the same thread. If we are CPU bound on such a platform, optimizing any CPU work will improve CPU performance. If this is the case for our game, we should read all the sections below and consider which optimizations are best for our game.

Graphics jobs Graphics jobs

The graphics jobs option in the player settings determines whether Unity uses worker threads to perform rendering tasks that would otherwise be done on the main thread and, in some cases, the rendering thread. On platforms that offer this feature, it can provide a considerable performance boost. If we wish to use this feature, we should profile the game with and without graphics jobs enabled and see how it affects performance.

Finding out which tasks are contributing to problems

We can determine which tasks are causing our game to be CPU bound by using the Profiler window. This tutorial will show how to determine what the problem is. Now that we understand what tasks are causing our game to be CPU bound, let's take a look at some common problems and their solutions.

Sending commands to the GPU Send commands to the GPU

The time it takes to send commands to the GPU is the most common reason why games are CPU bound. This task is performed on the render thread on most platforms, although on some platforms (such as PlayStation 4) it may be performed by worker threads. The most expensive operation when sending commands to the GPU is the SetPass call. If our game is CPU bound due to sending commands to the GPU, reducing the number of SetPass calls is probably the best way to improve performance. We can see how many SetPass calls and batches are sent to the profiler that renders Unity's profiler window. The number of SetPass calls that can be sent before performance suffers depends largely on the target hardware; high-end PCs can send many more SetPassCalls than mobile devices before performance suffers. The number of SetPass calls and its relationship to the batch size depends on several factors, topics we discuss in more detail later in this article. However, the usual situation is:

1. Reducing the batch size and/or having more objects sharing the same render state will reduce the number of SetPass calls in most cases.

2. Reducing the number of SetPass calls will improve CPU performance in most cases. If reducing the number of batches does not reduce the number of SetPass calls, it may still lead to a performance improvement by itself. This is because it is more efficient for the CPU to process a single batch than multiple batches, even if they contain the same amount of mesh data. Overall, there are three ways to reduce the number of batch calls and SetPass calls. We'll take a deeper look at each:

1. Reducing the number of objects to render may reduce batch and SetPass calls.

2. Reducing the number of times each object has to be rendered will generally reduce the number of SetPass calls.

3. Merging the data of objects that have to be rendered into fewer batches will reduce the number of batches. Different techniques will work for different games, so we should consider all of these options and decide which ones will work in our games and experiments. Reducing the number of objects being rendered Reducing the number of objects being rendered Reducing the number of objects that must be rendered is the easiest way to reduce the number of batch and SetPass calls. We can use several techniques to reduce the number of rendered objects. 1. Simply reducing the number of visible objects in the scene is an effective solution. For example, if we're presenting a lot of different characters in a crowd, we can try to reduce the number of those characters in the scene. If the scene still looks good, and performance improves, this may be a faster solution than more complex techniques. 2. We can use the camera's far clipping plane property to reduce the camera's drawing distance. This property is the distance at which the camera no longer renders the object. If we want to mask the fact that distant objects are no longer visible, we can try to use fog to mask the absence of distant objects. https://docs.unity3d.com/Manual/lighting-window.html

3. For a more fine-grained approach to hiding objects based on distance, we can use our camera's layer cull distance property Layer Cull Distances to provide custom cull distances for objects located on different layers. This approach can be useful if we have many small foreground decoration details; we can hide these details at a shorter distance than large terrain features.

4. We can use a technique called occlusion culling to disable the rendering of objects that are hidden by other objects. For example, if we have a large building in our scene, we can use occlusion culling to disable rendering of objects behind it. Unity's occlusion culling doesn't work in every scene, can cause extra CPU overhead, and can be complicated to set up, but it can greatly improve performance in some scenes. This Unity blog post on occlusion culling best practices is a great guide to the subject. In addition to using Unity's occlusion culling, we can also implement our own occlusion culling by manually deactivating objects that we know the player cannot see. For example, if our scene contains objects that are used for cutscenes, but are not visible before or after, we should disable them. Using our own game knowledge is always more efficient than letting Unity solve problems dynamically.

Reducing the number of times each object must be rendered Reduce the number of times each object must be rendered

Real-time lighting, shadows and reflections add a lot of realism to the game, but can be very expensive. Using these attributes can cause objects to be rendered multiple times, greatly affecting performance. The exact impact of these features depends on the rendering path we choose for our game. A rendering path is a term that describes the order in which calculations are performed when drawing a scene. The main difference between rendering paths is how they handle real-time lighting, shadows, and reflections. In general, if our game is running on high-end hardware and uses a lot of real-time lights, shadows, and reflections, deferred rendering may be a better choice. If our game runs on low-end hardware and doesn't use these features, forward rendering might be more appropriate. However, this is a very complex issue, and if we wish to take advantage of real-time lighting, shadows and reflections, it is best to research topics and experiment This page of the Unity Manual gives more information on the different rendering paths available in Unity and is a useful jumping-off point. This tutorial contains useful information on the subject of lighting in Unity. Regardless of the rendering path chosen, the use of real-time lights, shadows, and reflections can affect the performance of your game, so it's important to understand how to optimize them.

1. Dynamic lighting in Unity is a very complex topic, discussing it in depth is beyond the scope of this article, but this page of the Unity Manual has details on common lighting optimizations that could help you understand it.

2. Dynamic lighting is expensive. When our scene contains non-moving objects, such as landscapes, we can use a technique called baking to precompute the scene's lighting so that runtime lighting calculations are not required. This tutorial gives an introduction to the technique, and this section of the Unity Manual covers baked lighting in detail.

3. If we want to use real-time shadows in the game, this may be an area where we can improve performance. This page of the Unity Manual is a good guide to the shadow properties that quality settings can be adjusted, and how these will affect appearance and performance. : For example, we can use the shadow distance property to ensure that only nearby objects can cast shadows. That is to say, objects closer to the camera cast shadows, and objects farther away do not.

 Additionally, the Scene often looks better without distant shadows

4. Reflection probes Reflection probes can produce realistic reflections, but can be very expensive in volume. It is best to keep the use of reflection to the minimum performance considerations and optimize reflection as much as possible when using it.

Combining objects into fewer batches Batches can contain data from more than one object when certain conditions are met. To be eligible for batch processing, objects must:

1. Sharing the same instance of the same material

2. Batching suitable objects with the same material setup (i.e., textures, shaders, and shader parameters) can improve performance, although as with all optimization techniques, we must analyze carefully to ensure that the cost of batching does not outweigh the performance gain . There are a few different techniques for batching eligible objects:

1. Static batching is a technique that allows Unity to batch nearby suitable objects that don't move. A good example of something that can benefit from static batching is a bunch of similar objects, such as boulders. This page of the Unity Manual contains instructions on setting up static batching in our game. Static batching results in higher memory usage, so we should account for this cost when analyzing games.

2. Dynamic batching is another technique that allows Unity to batch suitable objects, regardless of whether they are moving or not. There are some limitations on the objects that can be batched using this technique. These restrictions are listed, along with instructions, on this page of the Unity Manual. Dynamic batching has an impact on CPU usage that may result in an overhead of CPU time that is greater than the CPU time saved. We should keep this cost in mind when experimenting with this technique, and use it with caution.

3. Batch processing Unity's UI elements is a bit complicated, because it will be affected by the UI layout. This video from Unite Bangkok 2015 gives a good overview of the subject and this guide to optimizing Unity UI provides in-depth information on how to ensure that UI batching works as we intend it to.

4. GPU instancing is a technique that allows large numbers of identical objects to be batched very efficiently. Its use is limited and not all hardware supports it, but if our game has many identical objects on screen at the same time, we might benefit from this technique. This page of the Unity Manual contains an introduction to the details of how GPU Instancing works with Unity, which platforms support it, and under what circumstances it might benefit our game.

5. Texture atlas is a technique in which multiple textures are combined into one larger texture. It is commonly used in 2D games and UI systems, but can also be used in 3D games. If we use this technique when creating art for our game, we can ensure that objects share textures and are thus eligible for batch processing. Unity has a built-in texture atlas tool called Sprite Packer for 2D games.

6. Meshes that share the same material and texture can be merged manually in the Unity editor or through code at runtime. When combining meshes in this way, we have to be aware that shading, lighting and culling will still be done on a per-object level; this means that the performance gains from merging meshes can be achieved by no longer being able to cull objects that otherwise wouldn't The rendered object to offset. If we wish to investigate this approach, we should examine the Mesh. CombineMeshes function. The CombineChildren script in Unity's Standard Assets package is an example of this technique.

7. We have to be very careful about accessing Renderer.material in scripts. This will copy the material and return a reference to the new copy. If the renderer is part of a batch, doing this will break the batch because the renderer is no longer referencing the same material instance. If we want to access the batch object's material in script, we should use Renderer.sharedMaterial

Culling, sorting and batching Culling, sorting and batching

Culling, collecting data on objects to be drawn, sorting this data in batches, and generating GPU commands can all be CPU bound. These tasks are performed either on the main thread or on separate worker threads, depending on the game's settings and target hardware.

1. Culling itself is unlikely to be very expensive, but reducing unnecessary culling may help performance. There is a per-object-per-camera overhead for all active scene objects, even those on layers that are not rendered. To reduce this, we should disable the camera and disable or disable renderers that are not currently being used.

2. Batching can greatly increase the speed of sending commands to the GPU, but it also sometimes adds unnecessary overhead elsewhere. If batching operations is causing our game to be CPU bound, we may wish to limit the number of manual or automatic batch operations in our game.   

Skinned meshes Skinned meshes

When using SkinnedMeshRenderers, we deform a mesh using a technique called skeletal animation. It is most commonly used in animated characters. Tasks related to rendering skinned meshes will typically be performed on the main thread or a separate worker thread, depending on the game's settings and target hardware. Rendering skinned meshes can be an expensive operation. If we can see in the Profiler window that skinned mesh rendering is causing our game to be CPU bound, there are a few things we can try to improve performance:

1. We should consider whether we need to use it for every object that currently uses the SkinnedMeshRenderer component. Maybe we've imported a model that uses the SkinnedMeshRenderer component, but we haven't really animated it. In this case, replacing the SkinnedMeshRenderer component with the MeshRenderer component will help improve performance. When importing a model into Unity, if we choose not to import animations in the model's Import Settings, the model will have a MeshRenderer instead of a SkinnedMeshRenderer.

2. If we are only animating the object at certain times (e.g. only at startup or only within a certain distance of the camera), we can convert its mesh to a less detailed version, or convert its SkinnedMeshRenderer component to a MeshRenderer components.

3. The SkinnedMeshRenderer component has a BakeMesh function that creates a mesh in a matching pose, which is useful for swapping between different meshes or renderers without causing any visible effects on the object Variety.

4. This page of the Unity Manual contains suggestions for optimizing animated characters that use skinned meshes, and the Unity Manual page on the SkinnedMeshRenderer component contains some tweaks that can improve performance. In addition to the advice on these pages, it's worth remembering that the cost of mesh skinning increases per vertex; therefore, using fewer vertices in our model reduces the amount of work that has to be done.

5. On some platforms, skinning can be handled by the GPU instead of the CPU. If we have a lot of capacity on the GPU, this option might be worth a try. We can enable GPU skinning for the current platform and quality target in Player Settings.

Main thread operations unrelated to rendering Main thread operations have nothing to do with rendering

It's important to understand that many non-rendering-related CPU tasks happen on the main thread. This means that if our CPU is limited to the main thread, we can improve performance by reducing the amount of time the CPU spends on tasks not related to rendering. For example, our game may at some point in the game perform expensive rendering operations and expensive userscript operations on the main thread, making us CPU bound. If we optimize rendering operations as much as possible without losing visual fidelity, then we have the potential to improve performance by reducing the CPU cost of our scripts.

If our game is GPU bound

If our game is GPU bound, the first thing to do is to find out what is causing the GPU bottleneck. GPU performance is often limited by fill rate, especially on mobile devices, but memory bandwidth and vertex processing can also suffer. Let's examine these issues and learn what causes it, how to diagnose it, and how to fix it.

Fill rate Fill rate

Fill rate refers to the number of pixels the GPU can render to the screen per second. If our game is fill-rate limited, it means our game is trying to draw more pixels per frame than the GPU can handle. Checking if the fill rate is causing our game to be GPU bound is simple:

1.Profile game and pay attention to GPU time.

2. Reduce the display resolution of Player Settings

3. Re-Profile the game. If performance improves, fill rate may be the problem.

If fill rate is the cause of the problem, there are a few things that can help us fix it.

1. A fragment shader is the part of the shader code that tells the GPU how to draw individual pixels. This code is executed by the GPU for every pixel it has to draw, so performance problems can easily pile up if the code is inefficient. Complex fragment shaders are a very common cause of fill rate issues. 2. If our game uses built-in shaders, we should aim to use the simplest and most optimized shaders to get the visual effect we want.

3. For example, the mobile shaders that ship with Unity are highly optimized; we should try to use them and see if we can improve performance without affecting the appearance of the game. These shaders are designed for use on mobile platforms, but they will work with any project. Using "mobile" shaders to improve performance on non-mobile platforms is great if they provide the visual fidelity your project requires.

4. If you use Unity's Standard Shader in our game object, it is very important to understand that Unity compiles this shader based on the current material settings. Only the currently used functions will be compiled. This means that removing features such as detail mapping can greatly reduce complex fragment shader code and thus greatly improve performance. Again, if this is the case in our game, we should experiment with the settings and see if we can improve performance without compromising visual quality.

5. If our project uses custom shaders, we should optimize them as much as possible. Optimizing shader is a complex subject, but this page of the Unity Manual and the Shader optimization section of this page of the Unity Manual contains useful starting points for optimizing our shader code.

6. Overdraw means that the same pixel is drawn multiple times. This happens when objects are drawn on top of other objects and it helps a lot with fill rate issues. In order to understand overdraw, we must understand the order in which Unity draws objects in the scene. An object's shader determines its drawing order, usually by specifying the render queue where the object is located. Unity uses this information to draw objects in a strict order, see page of the Unity Manual for details. Also, objects are ordered differently in different render queues before being drawn. For example, Unity sorts items front-to-back in the geometry queue to minimize overdraw, and objects back-to-front in the transparency queue to get the desired visual effect. The net effect of this sorting is to maximize the overdraw of objects in the transparent queue. Overdraw is a complex topic and there is no one general solution to the Overdraw problem, but reducing the number of overlapping objects that Unity cannot automatically sort is key. The best place to start investigating this issue is in Unity's Scene View; there is a Draw Mode that lets us see the Overdraw* (Overdraw) in our scene, and from there, determine where we can work to reduce it. The most common culprits of excessive overdraw are transparent materials, unshaded particles, and overlapping UI elements, so we should try to optimize or reduce these. This article on the Unity Learn site focuses primarily on Unity's UI, but also contains good general guidance on overdrawing.

7. Using image effects image effects can greatly help with fill rate issues, especially if we use more than one image effect. If our game uses graphic effects and is struggling with fill rate issues, we may wish to try different settings or a more optimized version of the graphic effect (such as Bloom (Optimized) instead of BloomBloom ). If our game uses multiple image effects on the same camera, this will result in multiple shader passes. In such cases, it may be beneficial to combine the shader code for image effects into a single pass, for example in Unity's PostProcessing Stack. If we have optimized image effects but still have fill rate issues, we may want to consider disabling image effects, especially on low-end devices.

Memory bandwidth Memory bandwidth refers to the speed at which the GPU can read and write dedicated memory. If our game is memory bandwidth bound, it usually means we're using textures that are too large for the GPU to process quickly. To check if memory bandwidth is an issue, we can do the following:

1. Configure the game and pay attention to GPU time.

2. Reduce texture quality for current platform and quality target in quality settings.

3. Configure the game again and pay attention to the GPU time. If performance improves, it's likely a memory bandwidth issue. If memory bandwidth is our problem, we need to reduce texture memory usage in the game. Again, the best technique for each game will vary, but there are a few ways we can optimize our textures

1. Texture compression is a technique that can greatly reduce the size of textures on disk and in memory. If memory bandwidth is an issue in our game, using texture compression to reduce the size of textures in memory can improve performance. There are many different texture compression formats and settings in Unity, and each texture can have individual settings.

In general, some form of texture compression should be used whenever possible; however, a trial and error approach to finding the optimal setting for each texture works best. This page in the Unity Manual contains useful information on different compression formats and settings.

2. Mipmaps are low-resolution texture versions that Unity can use on distant objects. If our scene contains objects that are far from the camera, we can use mipmaps to alleviate memory bandwidth issues. Mipmaps Draw Mode in the Scene View The Mipmaps Draw Mode allows us to see which objects in the scene can benefit from mipmaps, this page of the Unity Manual contains more information on enabling mipmaps for textures.

Vertex processing Vertex processing

Vertex processing refers to the work that the GPU must do to render each vertex in the mesh. The cost of vertex processing is affected by two factors: the number of vertices that must be rendered, and the number of operations that must be performed on each vertex.

If our game is GPU bound, and we've determined it's not limited by fill rate or memory bandwidth, then it's likely that vertex processing is causing the problem. If that's the case, trying to reduce the amount of vertex processing the GPU has to do may result in performance gains. There are a few methods we can consider to help us reduce the number of vertices or the number of operations we perform on each vertex.

1.: First, we should aim to reduce any unnecessary mesh complexity. If we use a mesh with a level of detail that you can't see in the game, or if the mesh is inefficient, with too many vertices because it was created wrong, it's a waste of work for the GPU. The easiest way to reduce the cost of vertex processing is to create meshes with lower vertex counts in our 3D art programs.

2. We can try a technique called normal mapping, where textures are used to create the illusion of greater geometric complexity on the mesh. While this technique has some GPU overhead, in many cases it leads to performance gains. This page of the Unity Manual has a useful guide to using normal maps to simulate complex geometric meshes.

3. If a mesh does not use normal mapping in our game, we can disable the use of vertex tangents in the import settings of the mesh. This reduces the amount of data sent to the GPU per vertex

4. Level of detail (also known as LOD) is an optimization technique that reduces the complexity of meshes away from the camera. This reduces the number of vertices rendered by the GPU without affecting the visual quality of the game. The LOD Group page of the Unity Manual contains more information on how to set up LOD in our game.

5. A vertex shader is a block of shader code that tells the GPU how to draw each vertex. If our game is vertex processing bound, reducing the complexity of the vertex shader might help.

6. If our game uses built-in shaders, we should aim to use the simplest and most optimized shaders to get the visual effect we want. For example, the mobile shaders that ship with Unity are highly optimized; we should experiment with them and see if we can improve performance without affecting the appearance of the game.

7. If our project uses custom shaders, we should optimize them as much as possible. Optimizing shaders is a complex issue, but this page of the Unity Manual and the Shader Optimization section of this page of the Unity Manual contain useful starting points for optimizing our shader code.
5. Conclusion conclusion

We've learned how rendering works in Unity, what kind of problems can occur while rendering, and how to improve rendering performance in our game. Using this knowledge and our profiling tools, we can fix rendering-related performance issues and organize our games so that they have a smooth and efficient rendering pipeline.
————————————————
Copyright statement: This article is an original article of CSDN blogger "oLingXi12", following the CC 4.0 BY-SA copyright agreement, please attach the original source link and this statement for reprinting .
Original link: https://blog.csdn.net/oLingXi12/article/details/106251046/

Guess you like

Origin blog.csdn.net/qq_15559109/article/details/123282409