Tencent's Juewu AI is fully experienced and open for a limited time, and its research is on top international conferences and top publications

Thanks for reading the 112th article of Tencent AI Lab WeChat ID. This article will introduce the technical methods of the fully upgraded version of Jue Wu AI, and readers are also welcome to experience its technical strength in the King Glory app.

Tencent AI Lab announced the launch of an upgraded version of the strategic and collaborative AI "Jue Wu" jointly developed with King Glory.

The innovative algorithm breaks through the limit of available heroes (the number of hero pools has been increased from 40 to 100+), allowing AI to fully grasp all the skills of all heroes, and can deal with changes in the number of hero combinations up to 10 to the 15th power;

The game strategy of BanPick (BP) has been optimized, and the best combination of heroes can be dispatched based on multiple factors such as own skills and opponents.

Related research has been included in the top AI conference NeurIPS 2020 and the top journal TNNLS, demonstrating Tencent's world-class AI research and application capabilities.

Reinforcement learning related research is included in the top AI conference NeurIPS 2020

Paper address: https://arxiv.org/abs/2011.12692

The "Complete Body" version of Jue Wu will be available for a limited time on the Honor of Kings app, allowing the public to personally experience the powerful capabilities of AI in complex strategies, teamwork, and micro-operations. The activity time is from November 14th to 30th. Jue Wu's ability in 20 levels will continue to improve. The strongest level 20 will be opened on November 28th to accept 5v5 team challenges.

AI strategy: the red side AI armor has a good overall view, squatting around the grass to reverse the situation

AI micro-manipulation: Blue Fang AI small-scale confrontation, fine operation to resolve offensive

AI collaboration: Blue AI team battles perfectly cooperate to win more with less

Ai micromanagement: AI Gongsunli perfect combo moves one show and three kills

40 to 100+, the hero pool is completely lifted

Shaolin has seventy-two arts. Kungfu is different, and the methods of practice are also different. If a scholar can do his best, he is invincible in the world.

In Glory of Kings, if each class has 4 purple proficiency heroes, you can unlock the title of "Almighty Master". But due to the limitations of practice time and energy, few people can master all heroes.

But Juewu AI did it. The technical team increased the number of heroes it mastered from 1 to 100+ within a year, completely lifting the ban on the pool of heroes, and this version was named "Jie Wu Complete Body".

Jue Wu AI ability evolution route, from MOBA novice player to professional level

Different heroes of Jue Wu AI will share a model parameter. It is easy to learn a single formation from scratch, but it is difficult to reach the sky when facing a combination of multiple heroes. In the battle, because the map is huge and the information is incomplete, different 10 hero combinations should have different strategic planning, skill application, path exploration and teamwork methods, which will increase the difficulty of decision-making geometrically and multi-hero combinations also bring Here comes the "catastrophic forgetting" problem. Models are easy to learn while forgetting. This has become a big problem that has plagued developers for a long time.

In order to deal with the problem of multi-hero combinations, the technical team first adopted the "teacher clone" model. Each AI teacher was trained on a single lineup to be proficient, and then an AI student was introduced to imitate all the AI ​​teachers, and finally let the "Je Wu" master All the skills of all heroes become a master of generation.

The long-term goal of the team is to make the "Quie Wu" hold a strong soldier, learn the skills of all heroes, and each hero can reach the top level, so three key breakthroughs have been made in technology:

The team first built an optimal neural network model to adapt the model to MOBA tasks, have strong expression skills, and finely model hero operations. The model combines the advantages of a large number of AI methods. Specifically, the long and short-term memory network (LSTM) is introduced to optimize some observable problems on the time series information, and the convolutional neural network (CNN) is selected on the image information to encode spatial features, and attention is used The Attention method strengthens the target selection, uses the Action Mask method to improve the exploration efficiency, uses the layered action design to speed up the training speed, and uses the Multi-Head Value method to reduce the estimation variance.

Network Architecture

Secondly, the team has developed a training method to broaden the hero pool and let "Ze Wu" master all hero skills-CSPL (Curriculum Self-Play Learning). This is a progressive learning method that allows AI from easy to difficult:

The first step is to select multiple sets of lineups covering all hero pools, and use reinforcement learning training under a small model to obtain multiple sets of "teacher clone" models.

The second step is distillation, which transfers the capabilities of multiple models obtained in the first step to the same large model.

The third step is the intensive training of the random lineup. In the distilled large model, the lineup is randomly selected to continue the intensive training and fine-tuning. Through the combination of a variety of traditional and novel technical methods, the goal of training in a large hero pool and at the same time can be continuously expanded.

CSPL flow chart. Design philosophy: tasks are from easy to difficult, models are from simple to complex, and knowledge is deepened layer by layer.

Using the CSPL method to expand the hero pool has obvious advantages

Third, the team also built a large-scale training platform-Tencent Enlightenment (aiarena.tencent.com), relying on the algorithm experience accumulated by the project, desensitization data, and Tencent Cloud's computing resources to provide large-scale computing required for training Escort. The Enlightenment Platform was opened to 18 universities in August this year. In the future, we hope to provide more scientific research personnel with technical and resource support to deepen research on topics.

Formation and formation rely on "AI coach" to use Tian Ji's equestrian skills

In a game, the key to victory is not only to have top players like "Je Wu", but also the coaches who line up troops. From the ancient Tian Ji horse racing to the offensive and defensive strategies on the football field, it was not a game process. The second goal of the team is to find an AI coach who can line up for the "Je Wu", which is the optimal strategy in the BP part of the game (forbidden to choose heroes).

Jue Wu vs human BP test

The simple way is to choose a greedy strategy, that is, to choose the hero with the highest current win rate. However, there are hundreds of heroes in Glory of Kings, and any hero has a relationship that promotes or restrains. It is easy to be targeted by opponents if you choose only according to the winning rate. It is necessary to comprehensively consider the relevant information of the enemy and the enemy, the selected and unselected heroes, to maximize Your own advantage minimizes the enemy's advantage.

Inspired by the Go AI algorithm, the team used an automatic BP model combining Monte Carlo Tree Search (MCTS) and neural network. The MCTS method includes four steps of selection, expansion, simulation, and backpropagation, and iterative search will continue to estimate the long-term value of the available heroes. Because the simulation part is the most time-consuming, the team replaces this link with a valuation neural network, speeds up the search, and can quickly and accurately select the hero with the greatest long-term value. It should be mentioned that the outcome of chess and card games such as Go can be determined at the end, but the BP only has to determine the lineup and has not yet played, so the outcome is not divided. Therefore, the team used more than 30 million match data generated by the self-match game to train a lineup win rate predictor to predict the lineup's win rate. Furthermore, the lineup win rate obtained by the win rate predictor is used to supervise the training evaluation network.

In addition to the common single-round BP, the AI ​​coach also learned the multi-round BP system that is common in the glory of the king KPL. Repeat heroes cannot be selected in this mode, and the selection strategy is more demanding. Therefore, the team has introduced a multi-round long-period judgment mechanism, which can make overall planning and comprehensive judgment in the BO3/BO5 competition system to make the best BP selection. The trained BP model can achieve a winning rate of nearly 70% against the benchmark method based on the greedy strategy, and the winning rate against a random lineup by position is closer to 90%.

At this point, there were a number of strong soldiers before Jue Wu, and later with the assistance of military advisers, and an out-and-out grandmaster was finally trained.

R&D expansion, from supervised learning to reinforcement learning, and back to supervised learning

The team has also developed a supervised learning (SL) method to simultaneously model the overall situation and micro-manipulation strategies, so that Jiuwu has excellent long-term planning and real-time operations, reaching the top level of non-professional players. Related technical achievements were publicly unveiled against human players in December 2018. In fact, the team's research and development on supervised learning has been ongoing. From November 14th this year, Levels 1 to 19 of Jue Enlightenment will have multiple levels trained by supervised learning.

Paper address of supervised learning methods: https://arxiv.org/abs/2011.12582

Although in theory, the performance of AI trained by supervised learning will be inferior to the results of reinforcement learning, such research is of great research and application value, and related technical results have also been selected in the top journal TNNLS.

In terms of research methodology, supervised learning is of great value to the research and development of AI agents. First of all, mining human data to predict the future of supervised learning is usually the first step in the development of game AI, and has achieved good results in many video games. For example, in complex video games such as Star Fight, purely supervised learning can also learn AI agents that reach the level of human master players. Second, it can be reused as a strategy network for reinforcement learning. For example, AlphaGo is a combination of supervised learning and reinforcement learning. Third, it can also shorten the exploration time of reinforcement learning. For example, DeepMind's StarCraft AI AlphaStar uses supervised learning as the implicit state of reinforcement training.

In application, it has many advantages, such as fast training, only a few days on 16 GPU cards, and several months for reinforcement learning. Secondly, it has strong expansion ability and can complete the whole hero pool training. Finally, using the desensitization data of real players and effective sampling, the AI ​​behavior produced will be closer to humans.

Network Architecture

Technology Application

On the one hand, Juewu will focus on the field of e-sports. As the most popular sport among young people in the digital age, e-sports has become a performance event in the Asian Games in 2018. The Chinese team won two golds and one silver in the competition. Like traditional sports, professional e-sports players also need hand-eye-brain coordination, rapid response to strategies and operations, teamwork and a lot of hard training. Relying on its advantages in algorithms and data, JueWu can provide professional players with real-time analysis and advice on data, strategy and collaboration, as well as professional training with different strengths and levels. With cutting-edge technology to promote the professional development of e-sports, AI will continue to promote China's e-sports to stay ahead of the world. On the other hand, Jue Wu can participate in game design, such as hero character balance test and parameter adjustment, improve test efficiency, optimize character balance, and participate in MOBA new map development.

OK

Tencent AI Lab has also jointly launched the AI+ game open platform "Enlightenment" with King Glory to create an industry-university-research ecosystem. King Glory opens up desensitized data, game core clusters (Game Core) and tools, and Tencent AI Lab opens up computing platforms and computing power for reinforcement learning and imitation learning. It invites universities and research institutions to jointly promote cutting-edge AI research, making enlightenment a showcase The leading research stage of the agent. Enlightenment will hold the first level test in December 2020.

Long-term goal

Tencent AI Lab not only researches MOBA games represented by King Glory, but also promotes research on multiple AI+ games simultaneously. In the chess and card category, I developed the AI ​​"Fine Art" that won four world championships in three years and served as a training partner for the Chinese National Go team; in the RTS game representing StarCraft 2, in an incomplete information game scene, it needs to be in a complex and continuous In the decision-making game for long-term decision-making in the decision space, we developed the first agent that can defeat the "open and hang" built-in AI in the "StarCraft II" full-court game; in FPS shooting games, we focus on solving 3D Environmental modeling, perception conversion and mobile tracing, and other problems, first won the first China championship in the history of VizDoom AI competition, and then FPS AI launched the mobile game "Cross the Line of Fire-Gunfight King (CFM)", which was well received.

In the long run, AI+game research will be a key step for Tencent to overcome the ultimate AI research problem-General Artificial Intelligence (AGI). AGI represents the research and development of AI that can execute a variety of complex commands in a general system, reaching or surpassing the level of human beings. From fine art to complete enlightenment, AI is constantly allowed to learn and evolve from 0 to 1, and develop a set of reasonable behavior patterns. The intermediate experience, methods and conclusions, in the long run, are expected to have a more profound impact in a wide range of fields, such as medical care, manufacturing, driverless, agriculture and smart city management.

☟Learn more:

AI Super Club! The strategic and collaborative AI "Jue Wu" made its first appearance in the KPL finals and surprised the audience!

Tencent's strategic and collaborative AI ``Jie Wu'' upgraded to the professional level of the glory of the king

Tencent AI Lab x King of Glory: Open to let the imagination of "AI + games" land

Tencent's "Jue Wu" AI 1v1 version of the paper was selected by AAAI, and the AI ​​+ game open platform "Enlightenment" internal beta was launched

Strategic and collaborative AI "Absolute Enlightenment" limited time challenge event, waiting for you to fight!

* Welcome to reprint, please indicate from Tencent AI Lab WeChat (tencent_ailab)

Guess you like

Origin blog.csdn.net/Tencent_TEG/article/details/110412197
top