Learning to Design Games Strategic Environments in Reinforcement Learning(部分翻译)

Summary

In a typical reinforcement learning ( Reinforcement Learning , RL), the (usually) assume a given environment, and the goal is to determine the optimal strategy of learning as a proxy agent for action by interaction with the environment. In this paper, we extend this setup by considering the real-time interaction with the agent agent by considering the environment is not given, but it is controllable and learnable (as long as it is). This extension is inspired by real-world environmental design scenarios, including game design, shopping space design, and traffic signal design. In theory, we found a double Markov decision process (MDP) about the agent environment, and derived a strategy gradient solution that optimizes the environmental parameters. Furthermore, the discontinuous environment is handled by the proposed generic generation framework. Our experiments on maze game design tasks show the effectiveness of this method. Generate various challenging maze algorithms for various proxy settings.

  1. Introduction

Reinforcement learning (RL) usually involves a scenario where an agent (or multiple agents) takes action and gets rewards from the environment. The goal of learning is to find an optimal strategy for the agent to maximize the cumulative rewards for interacting with the environment. (A) Successful applications, including a few examples, including playing games, congestion scheduling, and standardized advertising.

In most RL methods, such as SARSA and Q-learning, the model of the environment is not necessarily known a priori before learning the optimal strategy of the agent. Or, model-based methods, such as DYNA and priority scanning, require building an environment model while learning the optimal strategy. However, in any case, the environment is definitely available, and most are stationary or non-stationary, without a purposeful comparison.

In this paper, we consider the strategic and controllable environment to expand the standard RL settings. Our goal is to design the environment through interaction with a learnable agent or multiple agents. This can have many potential applications, from designing a game (environment) with a difficulty level that is expected to cater to the learning status of existing players, or designing a shopping space to drive customers to shop or to browse for a long time. To control congestion signals. In general, we assume and formulate environmental design issues that interact with intelligent agents / humans. We consider designing these environments through machine learning to release human labor and improve social efficiency. Compared with the in-depth study of image design / generation problems, environmental design problems are new in these aspects: (i) no basic truth samples; (ii) the generated samples may be discontinuous; (iii) the estimation of the samples is learned Intelligent agents to achieve.

Our formula expands the configuration of RL through the environmental model and control. Especially in some adversarial examples, on the one hand, the agent's goal is to maximize (its) cumulative reward; on the other hand, the environment tries to minimize the reward given by the agent for the optimal strategy. This (an approach) effectively creates a minimal maximization game between the environment and the agents. Given the MDP of an agent's execution environment, we can theoretically find a dual MDP about the environment, that is, about the current state of the agent and the actions taken. Solving the dual MDP can obtain a strategy gradient solution to optimize the parameter environment to achieve its purpose. When the environmental parameters are discontinuous, we assume a generative model framework to optimize the parameter environment, which overcomes the limitations of the environmental space. Our experiments on the maze game generation task show that it is effective to generate different and challenging mazes for different types of agents in different environments. We prove that our algorithm can successfully discover the agent's weaknesses and fight against them, thereby generating a purposeful environment. The main contributions of this paper include three aspects: (i) proposed a new environmental design problem with practical application potential; (ii) attributed the problem to a strategy optimization problem in continuous situations, and proposed a generation framework in discontinuous situations; (iii ) We apply our method to the design task of the maze game and show their effectiveness by generating special maze.

Published 34 original articles · Like 10 · Visitors 10,000+

Guess you like

Origin blog.csdn.net/weixin_41111088/article/details/87455936