Unity ECS

Author: Pang Wei Wei
link: https: //www.zhihu.com/question/286963885/answer/452979420
Source: know almost
copyrighted by the author. For commercial reprints, please contact the author for authorization. For non-commercial reprints, please indicate the source.

ECS solves 2 problems:

1) Performance;

2) Reduce unnecessary memory usage;

Put a picture, I wrote a demo test before, for the use of ecs, do not use ecs, the performance difference in the case of instancing optimization 3.

It can be seen that if the object you render is less than 500, the ecs performance does not significantly improve. When it exceeds 1000, the ecs performance has a significant advantage. Under 10000obj, the performance gap is almost 100.

So for games with obj within 200, it makes little difference whether to use ecs.

In addition, ecs is a systematic plan and standard proposed by unity. We can also use traditional methods to achieve similar results more or less, and ecs is not necessary.

The demo is as shown in the figure below (Instancing). According to the built-in rotate demo, the corresponding instancing and traditional method versions are completed. This demo is 1000 cubes with a sphere rotating. After hitting the cube, the cube will rotate for a period of time and stop gradually, so Need 1001 objects to keep updating:

========= Supplement the complete article shared within the company:

No longer need MonoBehaviour, Component and GameObject

In the past, MonoBehaviour carried two functions: game logic and data. We created GameObject, added MB (MonoBehaviour, the same below), and then updated the game logic through Update. Often we update some data in Update, and the implementation of MB is very complicated. Many unnecessary functions are inherited in a single brain, which leads to a very simple task that we may need to waste a lot of memory to deal with those unnecessary functions. If you use MB uncontrollably, it is basically a nightmare of game efficiency.

The previous Component is inherited from MB. Most of the time, the role of Component is to provide data, but the array organized by Component is not friendly to the CPU cache, because it does not organize the data that needs to be recalculated and updated multiple times, so The CPU may have a cache miss when doing calculations. If the game logic requires a large number of objects to update data, this part of the consumption may be very large.

At the same time, MB cannot solve the problem of execution order. For dynamically created GameObjects, the update order of MB is uncertain. We often system some MB Update and Destroy after some MB to ensure mutual dependence, but Unity's MB design has no better solution to this problem. Adjusting the Script Execution Order is both troublesome and useless (the previous test has no effect on the dynamically created MB and can only be used for the static MB in the scene).

Also, if there are a large number of GameObjects in the game with a large number of Component bound to it, then executing a large number of Updates is very time-consuming, and these Updates can only run on the main thread and cannot be parallelized.

To this end, Unity 2018.2 introduced a brand-new ECS system to solve the problems mentioned above.

Fully data driven

The core design goal of ECS is to remove the traditional MB, GameObject, and Component object structure, and change to a fully data-driven code organization. Through such changes, two problems can be solved:

1) Organize data more effectively and improve CPU cache utilization;

2) Parallelization.

Let's take a brief look at an example. This is the example that comes with the ECS sample:

In this example, you can see that although there are more than 1000 objects in the screen, there is no corresponding 1000 GameObjects. When the sphere hits a block, it will have a post-rotation attenuation, and it can maintain a frame rate of 300-600 fps. In the past, we needed to create 1000 GameObjects to achieve a similar effect, and then 1000 MB was responsible for updating the transform information of the GameObject. I implemented this method according to this method, then this demo is only about 1/3 of the fps.

Note that the above picture will create more GameObjects, with fps between 100-200fps. Of course, this implementation is not optimal. We can also use Instancing to optimize drawcall. In order to compare the version of Instancing, I implemented the following comparison:

The fps is about 150-300fps, you can see that instancing is about 1 times the fps. Under the 1000 objs test, the difference between different implementation methods is about 1-2 times. It seems that the difference is not very big, so I tested a higher obj The number of fps, under the higher objs test, the following chart is obtained:

It can be seen that with a higher number of obj, the advantages of the ECS method are reflected. Under 10000 obj, ECS can still achieve a high fps of 350, and even if Instance is optimized, only 4fps is left, which is almost 100. Times.

Now we return to the beginning, ECS has solved the following two problems:

1) Organize data more effectively and improve CPU cache utilization;

2) Parallelization.

But how to solve it?

Organize data more effectively and improve CPU cache utilization

The traditional method is to put the gameobject data on Components, for example, it may be like this (reference 1):

using Unity.Mathematics;
class PlayerData// 48 bytes
{
 public string public int[]
 public quaternion
 public float3
 string name;       // 8 bytes  
 someValues; // 8 bytes
}
PlayerData players[];

His memory layout is like this:

And this design is extremely unfriendly to the cpu cache. When you try to update float3 data in batches, the cpu cache predictions always fail because they are allocated to discontinuous areas in memory, and the cpu needs several addressing jumps. In order to find the corresponding data and operate, if we can continuously store the data that needs to be updated in batches, this will greatly increase the hit rate of the cpu cache and increase the calculation speed.

According to the ECS design specification, the batch of updated data is extracted and arranged in memory continuously, so that subsequent data can be read in one time during cache pre-reading, which improves the efficiency of cpu operation, as shown in the figure (reference 1) :

In the actual calculation, the indirect index is used to update the various types of data of all entities, instead of updating all the data of each entity, as shown in the figure:

As you can see here, the biggest changes are:

In the update calculation of a system (equivalent to the previous component), the positions of all entities are put together and updated in batches, because the float3 data of these positions are continuous inside, and the cpu operation is the fastest.

Parallelization

After extracting the data separately, the next task is to parallelize these data calculations and make full use of multiple cores. In the past, almost the logic code ran on the main thread. Of course, some project teams realized this problem and put some logic that has nothing to do with the display into multiple threads in parallel, but they did not completely abstract the data to form a complete set. Development framework, and ECS solves this problem.

ECS opens up the job system at the bottom and provides the c# job system at the top, which is a set of multi-thread scheduling system. If different data is not dependent on each other, you only need to put these data into multiple threads through the c# job system for parallel computing, such as:

public class RotationSpeedSystem : JobComponentSystem
    {
        struct RotationSpeedRotation : IJobProcessComponentData<Rotation, RotationSpeed>
        {
            public float dt;

            public void Execute(ref Rotation rotation, [ReadOnly]ref RotationSpeed speed)
            {
                rotation.Value = math.mul(math.normalize(rotation.Value), quaternion.axisAngle(math.up(), speed.Value * dt));
            }
        }

        protected override JobHandle OnUpdate(JobHandle inputDeps)
        {
            var job = new RotationSpeedRotation() { dt = Time.deltaTime };
            return job.Schedule(this, 64, inputDeps);
        } 
    }

If different data has dependencies and other data is required to be calculated to complete the calculation, you can set task dependencies, and the c# job system will automatically complete the calling of such tasks and the sorting of dependencies.

Mixed mode

There are already a large number of codes developed in traditional ways. If you want to enjoy the efficiency brought by ECS, it is necessary to significantly transform the existing codes. Is there a relatively simple way to keep the existing codes unchanged and improve For efficiency, the answer is that ECS provides a compatible hybrid mode.

For example, the following code (reference 2):

using Unity.Entities;using UnityEngine;
class Rotator : MonoBehaviour{
 // The data - editable in the inspector public float Speed;
}
class RotatorSystem : ComponentSystem{
 struct Group
    {
 // Define what components are required for this  // ComponentSystem to handle them. public Transform Transform;
 public Rotator   Rotator;
    }

 override protected void OnUpdate()
 {
 float deltaTime = Time.deltaTime;

 // ComponentSystem.GetEntities<Group>  // lets us efficiently iterate over all GameObjects // that have both a Transform & Rotator component  // (as defined above in Group struct). foreach (var e in GetEntities<Group>())
        {
            e.Transform.rotation *= Quaternion.AngleAxis(e.Rotator.Speed * deltaTime, Vector3.up);
        }
    }
}

The main modification is to move the Update function of MB to the OnUpdate function of ComponentSystem, and add a Group struct to exchange data between MB and ComponentSystem, and add a GameObjectEntity component to the original GameObject. The purpose of this component is Extract all other components of the GameObject and create an Entity, so that you can traverse the corresponding GameObject in the ComponentSystem through the GetEntities function.

Unity will create a corresponding ComponentSystem for all GameObjects with GameObjectEntity components mounted when it starts, so you can still use the previous GameObject.Instantiate method to create GameObjects. It's just that the Update function of the original MB is replaced with the OnUpdate function of ComponentSystem.

With this modification, you can mix some of the capabilities of ECS while maintaining the original code results. In summary:

The mixed mode can separate data and behavior, can update the data of the object in batches, avoid the virtual method call of each object (abstracted to a function), and still can continue to use the inspector to observe the properties and data of the GameObject.

However, this modification did not completely improve anything, the creation and loading time did not improve, the data access was still cache-unfriendly, and the data was not continuously arranged in the memory, nor was it parallelized.

It can only be said that this modification is to get closer to the phased code refactoring done by pure ECS. In the end, pure ECS must be used to achieve the greatest performance improvement.

Related references

1) Unity officially shares ppt, ECS & Job System.

2）https://github.com/Unity-Technologies/EntityComponentSystemSamples/