With 80 million daily active users, the background technical architecture of "Honor of Kings" has evolved!

This technical team, which once led the construction of the framework of the RTS game "Three Kingdoms" in the era of computer games, provided huge support for the game after transforming into the MOBA mobile game "Honor of Kings", but the process was not smooth.

At the Tencent TGDC that just ended this year, Sun Xun, the technical director of "Honor of Kings", conducted a technical review of the game in the technical session, and explained the game's engine, overall network architecture and technical aspects to the audience from a technical level. Attempts and changes in network synchronization schemes.

Sun Xun said that the current server architecture of the game is mainly composed of "game hall" and "PvP". "Honor of Kings" solves a series of problems that occurred later, such as "Android, iOS" same server.

In addition, he also introduced some attempts of "Honor of Kings" in network protocols and synchronization schemes, and reviewed the advantages and disadvantages of these attempts one by one.

For everyone to answer why, the final game will abandon the TCP protocol (Transmission Control Protocol) and the Client-Server structure (C/S structure) used in "Three Kingdoms", and use the UDP protocol (User Datagram) instead. protocol) and frame synchronization scheme.

This article is a compilation of the content of the keynote speech on "Technical Architecture of King Glory" brought by Sun Xun, technical director of Tencent's Glory of Kings project. It will be divided into several parts to introduce some of the content and thinking in the development process of the King's background: including the entire background introduction of "King of Glory", the back-end architecture, the adjustment after the launch, as well as the network synchronization scheme and the anti-zuobi scheme.

At present, there are more than 4,600 back-end machines in "Honor of Kings", and our capacity has also been expanded to a certain extent, and the number of processes is more than 40,000.

 

"Honor of Kings" game background

In 2012, the end game "Ba Three Kingdoms OL" we did at that time was the predecessor of the king. This product was originally an RTS-oriented game. Later, we changed it to a PC game MOBA, and later made a mobile game MOBA, which is now "Honor of Kings".

From 2012 to 2013, we started to make RTS games, from multi-control unit RTS games to MOBA games, to start the pre-research of mobile game MOBA in 2014, and then to February 2015, we put a lot of manpower (about 100 numbers) People) invested in the development of "Heroes of War" (the predecessor of "King of Glory"), and the time is not long.

The gameplay of "Three Kingdoms" is that players can form their own strategy in the game by arranging troops before the war, and by controlling multiple units, release skills, and release the characteristics of arms to form confrontation.

When we first worked on "Ba Three Kingdoms", the client engine was Unreal, but when we worked on "Honor of Kings", we switched to the Unity engine. During the 3-4 months of research and development, the product itself has nothing from the code level. All the code needs to be rewritten from the "Three Kingdoms".

Some Inspirations from "Ba Three Kingdoms OL"

The experience of doing the end game "Three Kingdoms Online" has brought us a lot of corresponding inspirations, such as planning, procedures and the entire team's understanding of MOBA.

In addition, when we were doing the terminal game "Three Kingdoms", we adopted the Client-Server model, but in fact, we borrowed the concept of frame synchronization in the process: for example, the processing of the field of view when the disconnection is restored.

The traditional approach is that the current mirror and other subsequent downlink notification information will be sent when re-entry.

There will be a problem with this approach. If you add other modules in the scene, according to the current various objects contained in the scene and various information about the state, you need to package these things and send them out in the follow-up. It will be very troublesome when developing and maintaining.

Our approach is to cache all the sequence packets sent by the server and resend them in sequence, so that the client can perform fast-forward performance. Its concept is similar to frame synchronization.

Another point is to reserve design flexibility. In the initial RTS, each player can operate a maximum of 5-8 units to fight against, and later changed to a MOBA game, where only one hero can be operated, and various In the scenario, our own technical framework does not need to make subversive changes.

The overall structure of "Honor of Kings"

At present, the overall architecture design of the backstage of "Honor of Kings" is derived from the needs of the product. If you have played "Honor of Kings", you will know that PvP confrontation is not divided into servers.

Players in WeChat 1 area can play against players in WeChat 2 area, and even the iOS platform can play with the Android platform, but at the same time, some common places also retain the concept of partition, such as the team and the leaderboard are based on the concept of "district" . "Zone" is the number in the game, which can be understood as the Logo on the player's newly created character.

When we first implemented the architecture, the server was relatively simple at the time. From the prototype, we only kept the lobby and the PvP server, which were separate.

The PvP server uses similar CGI calls, which can allocate the use of resources and recycle them after they are used up, and are not responsible for other things. Take what you need from the lobby, use it and return it to the lobby, and let the lobby write back to the DB.

We made a direct connection between the lobby and PvP, and later changed the direct connection to intermediate forwarding. In "Honor of Kings", we called it Proxy, which is equivalent to a proxy server to shield the details of the distribution of many processes in the backend. Because the game itself has many machines, processes, and different routing rules.

Some leaderboards or teams are based on the number of logical zones to determine which machine, or multiple machines, are processed. Some messages use random forwarding or multicasting, which are all routed by Proxy. Later, a room server was added, which is responsible for matching, ranking and other related functions in "Honor of Kings".

How to combine people with similar strengths to play together, the room matching server is responsible for it, so there will be teams matched with other server teams.

Finally, we added an Adapter to the above, which is used to realize the function of cross-server matching with the resources of the region that has been deployed.

In the back-end architecture of the game, except for servers like the clan, all other modules can be expanded online, or automatically shielded from the entire architecture when a fault that causes an online drop is found.

Because the routing method will limit, for example, the first area, the second area, and the third area to this machine for processing, if there is a failure, it will only affect the processing requested by players in certain logical areas, reducing the scope of the failure.

The current number of machines in "Honor of Kings" may find that some machines are broken every week, and at least one machine is down. It is very important to ensure that modules are automatically shielded in the architecture, and online expansion is performed.

 

The overall structure is more like the three-layer structure of MMO, which has a typical three-layer structure in Tencent. The lobby server will log in to the lobby server of a specific area according to the area where the player is located.

A single lobby process can carry 20,000 people, and a single PvP process can carry 12,000 people. The community logon to WeChat area 1 or area 2 is the character logo, which hits the player.

"Honor of Kings" now has four major areas on the external network, such as Android mobile QQ, Android WeChat, iOS mobile QQ, iOS WeChat, and preemptive servers.

We will use the program switch method to give priority to updating the preemptive server before the major version is released. At this time, it cannot be matched with the players of the official server because their versions are inconsistent. When the full server is released and its version update is consistent, we will turn on the switch, and players in the preemptive server can play PvP matches with players in the official server.

In addition, we also have a special experience server, which is designed for planning and verification. The experience server retains the operation of deleting files, but this is absolutely not allowed in the official environment.

In addition, the previous traditional mobile games were single-player, and many protocols were compatible, and the client version was not updated to play. However, the main gameplay in "Honor of Kings" is PvP, and combined with the implementation method, players of different versions cannot be matched together, so we have not made multi-version protocol compatibility.

Post-launch adjustments

After the launch, the background structure of "Honor of Kings" itself has not changed much on the whole, because when we are doing the end game, we are more aware of this structure, we know what kind of problems there may be, so the whole structure has been relatively stable.

However, we have made corresponding fine-tuning, and the most done is the optimization of the network itself. When "Honor of Kings" was launched, there were few real-time PvP games on the market that required strong network timeliness.

We have made various attempts, such as CPU performance optimization, delay, packet loss, etc. in the network, and the network itself takes the most time.

The fine-tuning of the architecture, like the transfer module just mentioned, we have many machines in the lobby and many PvP machines in our architecture, and each process in the architecture does not need to know the detailed information. For example, the lobby server does not need to know how many room servers are behind it, just need to know There is a room server at the back, and it is OK if you can access it.

The Proxy routing function is responsible for how to divide, balance the load, and how to shield the back-end faulty nodes. Because there are too many lobbies and PvP machines, we use Proxy to divide the whole structure into "branch" concepts that do not intersect with each other. Each group of Proxy is only responsible for a part of the lobbies and PvP servers.

These two kinds of servers are the most in the "Honor of Kings" server, but in addition to the back-end communication, connections are established between proxies, which reduces the number of single proxy channels while maintaining the communication of the entire structure.

Proxy Adapter was added after it went online. At the beginning, there were only four major areas, mobile QQ, WeChat, Android, and iOS. The earliest Android players could not hack with iOS.

There are also reasons for the separation of Android and iOS. We previously assumed that Android will be updated first, and iOS will be updated later to maintain the stability of version updates. But then we hope that players on Android and iOS can hack together because of the relationship chain.

Therefore, when the update frequency of Android and iOS versions is the same, we hope that we do not need to deploy too many additional machine resources and development, and directly use the existing PvP servers and regional resources of Android and iOS to get through PvP between Android and iOS.

When Android players log in to the Android area, they will be connected to the Android lobby, and after iOS login, they will be connected to the lobby of the iOS area. When they need to open the black box, we bridge all the areas of the transfer module through the Adapter, and deliver them to the platform through a certain algorithm. a certain region. The choice of delivery is directly related to the proportion of regional resources.

Network synchronization scheme

When doing "Three Kingdoms" before, the client-server mode was used, and the server judged the performance of the client, so why did we choose the frame synchronization method when doing "Glory of the King"?

The benefits of Client-Server mode are:

First, safety. Because it is all server computing, the client is only responsible for performance-level functions and will not affect the results of various judgments.

In addition, because the Client-Server mode is based on results, packet loss can occur in the middle, and packet loss can be accepted and processed, as long as the final result is consistent.

Frame synchronization is used more in end games. DotA, which everyone is more familiar with, and "StarCraft" all use frame synchronization technology.

Frame synchronization itself has more stringent requirements on the network. The sent execution sequence does not allow packet loss, and the sequence must be strictly guaranteed. If the packet is 12345, it must be 12345. If the packet is lost, it must wait until the lost packet arrives. The sequence is executed subsequently.

MOBA itself has a lot of units, and the client has a maximum of nearly 100 units on the same screen. If an AOE skill hits 20 units, and then a debuff is planted, the Client-Server state mode needs to send this information, which may potentially synchronize There is a lot of status information.

Another way of developing the Client-Server mode itself, it is more difficult to perfectly match the client's performance and the server's judgment.

When we were doing the terminal game MOBA before, it took us two or three weeks to develop a hero skill. The development cycle of "Honor of Kings" at that time was three or four months. Under such time pressure, we could not do it with the Client-Server method, and there was not enough time.

At that time, the team was quite nervous, because at that time, there was no mobile game that used this method to strengthen PvP and high timeliness.

The anti-jitter ability of the frame synchronization network is relatively weak, because it cannot lose packets. The basic principle of frame synchronization, if you are interested, you can come down and understand it yourself.

Generally, there will be a distinction between network or host mode. The main point of this technology is that the operations in the bureau are all based on client-side operations. Each of the 10 people will have their own calculations, with the same starting point, the same input, and the exact same intermediate operation logic. There is no random process. The results of the time operation should theoretically be consistent.

Not even floating point arithmetic should exist, it has precision issues. Including many collisions, animations, and basic mathematical operation libraries are implemented by themselves in the background. It needs to be reshaped by floating point to avoid the local logic of the client. This is the easiest mistake to make, and this is the most common cause of out-of-synch. .

If an inexperienced client program uses local code to do the corresponding logic when writing the program, it may run farther and farther, and 10 people are in a parallel world.

The overall network structure is generally divided into three layers: the server, the client logic layer, and the client presentation layer.

The server is mainly responsible for two parts:

  • Collect all player upstream input, pack it into a sequence of inputs at timed intervals, and deliver it to all clients.

  • When the client loses packets, the server reissues it; it also replaces the redundant upstream information of the client. For example, when a new input arrives, the old input is dropped or replaced.

 

In "Honor of Kings", our logic is to synchronize 15 packets every 66 milliseconds, which is indispensable, because the frame synchronization cannot lose packets, and the data packets must have a strict execution sequence.

The client logic layer is understood as the local service of the client, that is, the results of all client operations must be strongly consistent, and there cannot be true randomness, local logic, or floating-point arithmetic. With the same input, the result must be the same.

The client presentation layer will copy or mirror the data in the logic layer, and then smooth it in the presentation layer. The number of frames is different, but it will not affect the final calculation result, only the performance of animation and action.

When PvP was first launched, we used TCP technology. TCP performs well in the case of a local area network, and there is no problem, but when there is packet loss or jitter in the external network, it is limited by the implementation method.

For example, due to various reasons such as windows and slow start, it will be found that the game is very stuck when there is a reconnection, so we did not use TCP and changed to UDP. If packet loss occurs, the server will re-send at the application layer.

UDP is limited by the size of the MTU (maximum transmission unit). If it is greater than the MTU, there will be packetization, and the loss of the entire packet may also occur.

Therefore, some larger packages will be subpackaged by the server at the App layer. If the package is lost in the middle, it will be reissued by the server, and the fragmented package will be assembled into a whole package and then unpacked.

The more valuable is the UDP packet. If the mobile phone loses packets due to signal jitter, etc., it is a more effective solution to use the redundant method when sending it.

The message of frame synchronization is relatively small. According to the theory of 15 driving frames per second, the video recording of 20 minutes is about 10M. However, according to our external network statistics, a normal 5V5 game takes 20 minutes, and the size of the video is about 3M.

The server will store the player's operations in pure memory. When packet loss occurs, the server will quickly find the cached information through the number and send it. At the same time, according to the packet loss situation, we will calculate the change in the amount of redundancy sent to this person.

At the beginning, each packet will be redundant with the information of the first 3 frames. If the packet loss is serious, we will try to redundantly send more information. After the client gets it, it will try to compress the process of logic execution.

The more troublesome mode of frame synchronization is that it is not like the Client-Server mode. It must be run from the beginning after a crash, and the intermediate operation process cannot be omitted.

Of course, we have also tried some other methods. For example, after the client goes up, the server does not need to collect and then send it at regular intervals, but directly send it through the coloring frame number, so that the response is more timely, and the operation feedback is stronger and faster.

The result we made at the time was that the improvement in hand feel was minimal, but the negative problems it brought were huge, because it was no longer a fixed delivery of 15 packets per second, and the number of delivered packets was very large. operating habits.

It is possible that a person generates a dozen or twenty inputs within one second, and these inputs need to be packaged and sent to the client. Because the client receives a lot of packages, the device will also become obviously hot.

We also cooperate with other departments to make technology similar to TCP. Everyone intuitively thinks that if the packet is lost, it will be retransmitted at the IO layer.

However, the actual results will find that the technology is low-level, so the control of packet loss is not so flexible, and the possible results are not as good as TCP itself.

The traditional frame synchronization method will do delayed delivery, which we have also tried. If packet loss occurs within the interval, or network fluctuations occur when packets are downstream, jitter and packet loss can be smoothed out by delaying delivery.

The reason why we tried this solution but didn't do it in the end is that some heroes in "Honor of Kings" feel like they are partial to the action, and the response requirements are relatively fast. Although the anti-jitter and anti-packet loss capabilities of delayed delivery are really good, the hand feel is not good. not meet our requirements.

In addition, the implementation of the Client-Server method generally has a routine. The client performs in advance, and smoothes or pulls according to the performance of the server.

We also tried this solution, but finally gave up, because this technology will make the performance of the character itself a little floating.

When the client moves locally, the client's performance will follow immediately, but according to the downlink of the server, some offsets or corrections will actually be made. When network jitter occurs, the character will be a little fluttering, so we gave up this solution.

Frame synchronization scheme, all clients perform operations and expect consistent results, but if a bug or someone uses a modifier, the results will be different from others. When the difference occurs, we say that it is out of synchronization. .

We will regularly extract some key information for hashing, and the hashes of those who are not synchronized will be different from those of others.

The out-of-sync rate of "Honor of Kings" was about 2% when it was launched, that is, in 100 rounds, one or more people may appear in 2 rounds, and the result is different from others. We have now achieved an out-of-sync rate of 3/10,000, and this happens only in 3 out of 10,000 rounds.

How does this improve? If you use frame synchronization, you will definitely encounter out-of-sync problems. The client writes wrongly and uses local logic. Maybe the operation error of floating-point numbers reaches such a critical point, and it will produce inconsistent operation results.

We have many methods: automated testing, using robots to run continuously. For example, before launching a new hero, there is a scripted test to run continuously to see if it will produce asynchronous results; there are special experience servers, preemptive servers, and the official release. Test the network first, expose the problem first, and then solve the problem.

In addition, when it is out of sync, we will upload and save the entire recording and the log between the client, so that we can quickly locate the problem based on the recording and the log sequence executed in the middle.

We also have corresponding monitoring on the delay and the quality of a single round, whether there is a card or how many times there is a card in this round, whether there is any packet loss, how many packets are lost, what is the maximum delay and maximum jitter, we all have corresponding records. and statistics.

The students in the operation department have provided us with a lot of help, and we will have the integration of SDK related to network speed measurement and problem analysis.

According to our own statistics, there are several main reasons for game lag:

  • The bandwidth of the community is relatively busy, and many communities are actually public bandwidth outlets. For example, someone is playing a movie or watching a live broadcast, which occupies a high bandwidth, and you may get stuck when playing games.

  • The Wi-Fi router delay is relatively high. If the Wi-Fi router at home has not been restarted for a long time, there will be too many terminals, channel interference, and other high-traffic application downloads, which will also affect your playing "Honor of Kings".

  • Poor mobile phone signal, signal jitter, Wi-Fi, 4G air interface packet loss, etc.

We have made a lot of attempts in network optimization, such as increasing redundancy according to packet loss, and then optimizing the efficiency of our execution in all aspects to reduce CPU usage.

In terms of the background of "Honor of Kings", there are two points that we have been working hard to do, network optimization and matching mechanism, we try to use various methods, and even try to use AI deep learning methods later to be more accurate The positioning of the player's own true level allows him to match more realistic opponents and teammates of the same level.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325979828&siteId=291194637