Reinforcement learning-an introduction Reinforcement learning translation section 1.4

1.4 Limitations and Scope

Reinforcement learning relies heavily on the concept of state-as the input of strategy and value function, as the input and output of the model. Informally speaking, we can think of state as a signal that conveys to the actor the feeling of "what is the environment" at a certain time. Chapter 3 gives the framework of Markov decision process, and gives the formal definition of state. However, more generally, we encourage readers to follow informal meanings and view state as any information that an agent can obtain about its environment. In fact, we assume that the state signal is generated by some preprocessing system, and the preprocessing system is nominally part of the agent's environment. In this book, we do not discuss the issue of constructing, changing, or learning state signals (except for the brief introduction in Section 17.3). We take this approach, not because we believe that state representation is not important, but to pay full attention to decision-making issues. In other words, our focus in this book is not to design status signals, but to decide what action to take as a function of available status signals.

Most of the reinforcement learning methods we consider in this book are built around estimated value functions, but this is not strictly necessary to solve reinforcement learning problems. For example, solutions such as genetic algorithms, genetic programming, simulated annealing, and other optimization methods never estimate value functions. These methods apply multiple static policies, each of which interacts with a separate instance of the environment over a long period of time. The strategies that get the most returns, and their random changes, are carried over to the next generation of strategies, and this process is repeated. We call these evolutionary methods because they operate similarly to biological evolution to produce organisms with skillful behaviors, even if they have not learned in individual life. If the strategy space is small enough, or a good strategy can be constructed to be universal or easy to find, or if there is a lot of time for searching, then the evolutionary approach will be effective. In addition, the evolutionary method has advantages in that the learning subject cannot perceive the complete state of its environment.

Our focus is on reinforcement learning methods, learning in the process of interacting with the environment, which is impossible for evolutionary methods. In many cases, methods that can use the interaction details of individual behaviors are more effective than evolutionary methods. Evolutionary methods ignore many useful structures of reinforcement learning problems: they do not use the fact that the policy they are looking for is a function from state to action; they do not notice which states a person has experienced in their life, or Which behaviors are selected. In some cases, such information may be misleading (for example, when the country is misunderstood), but in more cases, it should enable more effective searches. Although evolution and learning have many common characteristics and work together naturally, we do not think that evolutionary methods themselves are particularly suitable for reinforcement learning problems, so we will not cover them in this book.

Reinforcement learning-an introduction Reinforcement learning translation section 1.4

1.4 Limitations and Scope

Guess you like