Jiangyu Interactive: Exploration of Serverless Architecture in the Game Field

Jiangyu Interactive is an emerging game company. Since its establishment in 2018, Jiangyu Interactive has been facing the vast global game market and has gained a foothold in the highly competitive game market by creating interesting game experiences. In just two years, Jiangyu Interactive has been among the top 30 Chinese game manufacturers going overseas with a single product called Topwar (Pocket Aid). Under the mission of "Chinese games, the future can be expected", Jiangyu Interactive is constantly enriching its game categories, hoping to bring more happiness to players around the world.

With the rapid growth of business, the system scale and system complexity of the game server are undergoing earth-shaking changes. Fortunately, Jiangyu Interactive has a highly effective technical team. Although the overall size of the team is not large, they have always maintained the exploration of cutting-edge technology fields and maintained the technological advancement of the system architecture through various means. Better support business needs and reduce IT costs.

In the multiple iterative upgrades of the technical architecture, there is a very important work, which is to abstract the common business capabilities in the game scene, separate it from the main game server, and precipitate it into the unified service layer, which is supported at the same time in a modular manner. Multiple game categories of Jiangyu Interactive. The business capabilities separated from the main server include account management, IM, content security, membership system, information push, game behavior analysis, etc. This first reduces the business complexity of the main game server and enables the main server to focus on Support for the core game scene. In addition, common capabilities can be reused in multiple game categories, thereby reducing R&D costs and improving R&D efficiency.

The ability to split and reduce the degree of business coupling provides convenience for continuous iteration and new technology pre-research, and also creates an opportunity for Jiangyu Interactive's in-depth exploration in the field of cloud-native serverless. Serverless architecture can give full play to the rapid elasticity of computing resources and is an important development direction of cloud computing. In the game field, the main game server carries complex core business logic, needs to run for a long time, and interacts with multiple player terminals with extremely low latency. Therefore, it still needs to be carried by a virtual machine or container. The game surrounding business scenarios separated from the main server have become the preferred target of the pilot serverless technology architecture.

New online translation needs of Jiangyu Interactive

The online translation business is the first scenario for serverless pilots, which is related to the globalization strategy of Jiangyu Interactive. Jiangyu Interactive’s flagship work "Pocket Shock" is a game for the global market, attracting players from all over the world. Every time we enter the game interface, we can see players in different languages ​​and with different flags, happily communicating various game-related topics.

In this business scenario, by providing a simple online translation function, players from all over the world are brought together to bring an unprecedented user experience. This kind of simple and easy-to-use design is also one of the reasons why Pocket Aid has received high scores and praises from players in all major application markets.

For Jiangyu Interactive, it is obviously unrealistic to develop a real-time translation tool from 0 to 1 that includes dozens of languages ​​around the world. Fortunately, the exchanges between game players are often concise and concise, and the translation results do not need to be 100% accurate to be able to understand them. Instead, there are relatively high requirements for the timeliness of background processing. Online platforms like Google Translator already provide powerful online translation capabilities, so you can forward the translation work to a third-party platform after simple preprocessing of the player's request.

This is a very simple function, but it still has certain challenges in the realization of the technical architecture. The number of players online at the same time in each time period is not completely equal, and there are obvious peaks and valleys. When the number of players online at the same time is relatively large, a very large amount of chat will be generated. Moreover, the chat volume will not simply be directly proportional to the number of players online. When certain hot events are encountered, it will trigger a heated discussion among players around the world, and the amount of messages that need online translation will also increase sharply, which requires a set of features. Flexible framework to handle translation requests from players.

The initial architecture was implemented through load balancing SLB and PHP application cluster based on the EasySwoole framework.

In this architecture, the main application written in PHP performs a series of preprocessing on the player's translation request, including symbol code replacement and filtering of sensitive content, and then forwards it to a third-party translation platform to obtain the translation result. This is a very widely adopted technical architecture with high concurrent processing capabilities. In the era of cloud computing, the elastic scalability of cloud resources can be used to enable the throughput of the entire cluster to dynamically adjust with changes in business volume. But from the perspective of cloud native, this architecture still has some imperfections when it runs on a large scale in a production environment.

1. Heavy maintenance workload. The maintenance workload of the entire system covers multiple levels such as virtual machines, networks, load balancing components, operating systems, applications, etc., requiring a lot of time and effort to ensure the high availability and stability of the system. For the simplest example, when an application instance fails, how to locate the fault as soon as possible and remove it from the computing cluster as quickly as possible? All of these need to cooperate with a complete monitoring mechanism and fault isolation recovery mechanism to achieve.

2. The elasticity of elasticity is lagging behind. Whether it is through timing tasks or through indicator thresholds (CPU utilization, memory utilization, etc.) to trigger elastic expansion, there is no way to fine-tune management based on the actual amount of requests. When encountering a sharp increase in the density of chat requests, you will face The problem of lagging elastic scalability. Even with technical optimizations such as Kubernetes and reserved resource pools, it often takes a few minutes to expand a new instance.

3. Low resource utilization. The lagging elastic scaling capability will lead to a relatively conservative scaling strategy, resulting in a decrease in resource utilization. The most direct manifestation is an increase in resource costs:

What are the advantages of the serverless solution based on Alibaba Cloud Function Computing FC?

Is there a solution that can help the technical team focus on the realization of the business logic, and can perform fine-grained resource allocation according to the actual request of the players, so as to maximize the use of resources? With the rapid development of cloud computing, major cloud vendors are actively exploring new solutions, using more "cloud native" ideas to solve cost and efficiency issues. The serverless solution based on Alibaba Cloud Function Computing FC is an outstanding one in this field. representative.

Function computing FC is an event-driven fully managed computing service. Through function computing, developers do not need to manage infrastructure such as servers, just write code and upload, and function computing will automatically prepare computing resources to run business in a flexible and reliable manner. Logic, and provide additional functions such as log query, performance monitoring, and alarm to ensure the stable operation of the system.

Compared with the traditional way that application servers keep running and provide services to the outside world, the biggest difference of function computing is that it pulls up computing resources to process tasks on demand, and automatically recovers computing resources after the tasks are completed. This is a real serverless approach. The conceptual scheme can maximize resource utilization and reduce system maintenance workload and use cost . Because there is no need to apply for computing resources in advance, users do not need to consider capacity evaluation and elastic scaling at all, and only need to pay according to the actual usage of resources.

Serverless's actual landing in the game field

For the implementation of simple business logic such as online translation, it is easy to migrate from the traditional architecture to the serverless architecture. Jiangyu Interactive treats each translation request initiated by the player as a task of function calculation, pulls up the corresponding computing resources for processing, and automatically releases the resources after the task is completed. Because the technical team of Jiangyu Interactive has the highest level of familiarity with the Java language, in the process of serverless transformation, the Java language is used to realize the online translation function, and it can also make full use of the rich ecological capabilities of the Java system. Of course, functional computing does not limit the use of specific development languages, nor is it limited to specific business logic, mainstream development languages ​​can be very well supported. After serverless transformation, the system architecture of online translation business becomes simpler.

Functions configured with HTTP triggers can directly respond to requests initiated by players, and schedule corresponding computing resources for processing in a flexible and reliable manner. Since the task allocation of function computing can completely match the changes in front-end user traffic, load balancing SLB is no longer useful and can be directly removed from the architecture. At the same time, long-lived application clusters are no longer needed. The functional computing platform can quickly pull up a large number of computing resources to execute tasks concurrently, and ensure the high availability of the entire architecture. Among them, the role of Redis is to cache some high-frequency simple statements and reduce the dependence on third-party platforms. The biggest surprise that this simplification of the architecture brings to the Jiangyu Interactive technical team is that capacity planning and elastic scaling management are no longer required, allowing the team to concentrate on realizing business needs and realizing business innovation in more areas.

Compared with languages ​​such as Node.js, Java instances will take longer to initialize and class load. Although functional computing FC has achieved millisecond-level computing resources through a variety of optimizations, often a Java program is really put into operation. It takes a few seconds, which is a very unfavorable factor for delay-sensitive services such as online translation. The solution proposed by Alibaba Cloud is to solve the problems encountered by delay-sensitive businesses through the two technologies of single instance multiple concurrency and reserved instance.

Through single-instance multiple concurrency, each function calculation instance that is pulled up can concurrently process up to 100 tasks, thereby reducing the average execution time, saving costs, and reducing the probability of cold start. Through reserved instance optimization, computing resources can be allocated in advance according to the load changes of the function, so that the system can still use reserved instances to process requests when expanding the capacity of the instance, thereby completely eliminating the delay glitch caused by cold start.

The transformed online translation business adopts a serverless architecture that uses computing resources completely on demand, which can make full use of the elastic capabilities of cloud computing. In terms of cost , since applications no longer need to run for a long time to provide services to the outside world, the usage of cloud resources can be fully matched with changes in actual business volume, thereby achieving a substantial increase in average resource utilization. In terms of system throughput , because functional computing FC can quickly mobilize the computing resources of tens of thousands of instances in a short period of time, it can support massive concurrency during peak business periods or sudden increase in user requests, and capacity evaluation is no longer required. Preliminary work in terms of system maintenance ; in terms of system maintenance , there is no need to reserve computing resources or maintain the underlying software and hardware, which greatly reduces operating costs and allows the technical team of Jiangyu Interactive to focus more on complex business logic. Realization and technological innovation. In the online translation scenario, compared with the traditional architecture, the serverless solution based on function computing FC can help Jiangyu Internet to save more than 40% of IT cost investment.

Another thing that makes Jiangyu Interactive feel that the research and development efficiency has been significantly improved is the version and alias management functions provided by Function Computing FC. The version is equivalent to a snapshot of the service, allowing users to release one or more versions for the service. With the alias mechanism, continuous integration and continuous release of the software development life cycle can be realized, and the gray-scale iteration of the service can be realized in the most convenient way.

In the subsequent optimization of the architecture, Jiangyu Interactive will try to preprocess the original content as much as possible through machine learning technology to reduce dependence on third-party platforms. In the field of AI reasoning, the advantages of the serverless architecture can still be used to schedule a large number of computing resources for large-scale parallel processing in a short time through a pre-trained deep learning model.

After the successful trial of serverless technology in online translation scenarios, Jiangyu Interactive continued to explore scenarios that match serverless technology in more business areas, and introduced serverless technology in areas such as Push service, content security, and game behavior analysis. In the future, Jiangyu Interactive will continue to explore serverless architecture based on its own technical characteristics, and fully enjoy the benefits of cloud computing while embracing new technologies.

Author: Hunting Hill, brave Wang, Zhang Yu

Original link

This article is the original content of Alibaba Cloud and may not be reproduced without permission.

Guess you like

Origin blog.csdn.net/weixin_43970890/article/details/113876425