Reinforcement learning-an introduction Reinforcement learning translation section 1.1

Chapter 1

Introduction

​ When we think about the nature of learning, we may first think of the idea of ​​learning by interacting with the environment. When a baby is playing, waving his arms or looking around, he does not have a clear teacher, but he does have a direct sensorimotor connection with the surrounding environment. Using this connection can generate a lot of information about causes and effects, about the consequences of actions, and what should be done to achieve goals. In our lifetime, this interaction is undoubtedly our main source of understanding of the environment and ourselves. Whether we are learning to drive or talk, we are keenly aware of how our environment responds to what we do, and we try to influence what happens through our actions. Learning from interaction is the basic idea of ​​almost all learning and intelligence theories.

​ In this book, we explore a computational method that learns from interaction. We mainly explore the idealized learning environment and evaluate the effectiveness of various learning, rather than directly explaining how humans or animals learn in theory, artificial intelligence researchers or engineers. We explore the design of machines, aiming to solve learning problems of scientific or economic interest, and evaluate the design through mathematical analysis or computational experiments. The method we explore is called reinforcement learning, which focuses more on goal-oriented interactive learning than other machine learning methods.

1.1 Reinforcement Learning

​ Reinforcement learning is to learn how to map context to action, so as to maximize a digital reward signal. The learner is not told which actions to take, but must try to discover which actions will produce the greatest return. In the most interesting and challenging situations, actions can not only obtain direct rewards, but also affect the next situation and obtain all subsequent rewards through it. These two features are repeated search and delayed rewards are the two most important features of reinforcement learning.

​ Reinforcement learning, like many topics whose names end with "ing", such as machine learning and mountain climbing, is also a problem, a type of effective solution to the problem, and the field of studying this problem and its solutions. It is convenient to use a name for these three things, but at the same time they must be conceptually separated. In particular, in reinforcement learning, the distinction between problems and solutions is very important; failure to distinguish between problems and solutions is the source of many confusions.

​ We use the idea of ​​dynamic system theory to formalize the reinforcement learning problem, in particular, as the optimal control of the incompletely known Markov decision process. The details of this formalization must wait until Chapter 3, but the basic idea is to simply capture the most important aspects of the practical problems that the learning agent faces when interacting with the environment to achieve its goals. The learning agent must be able to perceive the state of its environment to a certain extent, and must be able to take state-related actions. The agent must also have one or more goals related to the state of the environment. The Markov decision-making process aims to include these three aspects in the simplest form, namely, feeling, action, and goal, without making them trivial. Any method suitable for solving such problems is considered to be a reinforcement learning method.

​ Reinforcement learning is different from supervised learning. Supervised learning is the latest research in the field of machine learning. Supervised learning is learning from a set of labeled examples provided by a knowledgeable external supervisor. Explains a situation, in each situation, a measure should be taken to illustrate this situation. The purpose of this learning is to allow the system to extrapolate or generalize its response in order to operate correctly in the absence of the training set. This is an important way of learning, but this alone is not enough to learn from interaction. In interaction problems, it is often impractical to obtain examples of desired behaviors that are both correct and representative of all situations where the agent must act. In an unknown field, an agent that expects learning to be the most beneficial must be able to learn from its own experience.

​ Reinforcement learning is also different from what machine learning researchers call unsupervised learning, which usually looks for structures hidden in unlabeled data sets. The terms "supervised learning" and "unsupervised learning" seem to categorize machine learning paradigms in detail, but this is not the case. Although reinforcement learning does not rely on examples of correct behavior and may be considered an unsupervised learning, reinforcement learning tries to maximize the reward signal rather than trying to find hidden structures. Revealing the structure of the subject's experience is certainly useful in reinforcement learning, but it cannot solve the reinforcement learning problem of maximizing the reward signal. Therefore, we believe that reinforcement learning is the third machine learning paradigm. In addition to supervised learning and unsupervised learning, there may be other paradigms.

​ One of the challenges faced by reinforcement learning (rather than other types of learning) is the trade-off between exploration and development. In order to obtain a large number of rewards, the reinforcement learning agent must like the behaviors that it has tried in the past and found to generate rewards. But to discover such behavior, it must try actions that it has never chosen before. In order to be rewarded, the agent must use what it has experienced, but it must also explore in order to make better choices for action in the future. The current dilemma is that neither exploration nor development can be carried out alone without completing the task. The agent must try a variety of actions and gradually lean towards those that seem best. In a random task, each action must go through multiple attempts to obtain a reliable estimate of its expected return. For decades, mathematicians have conducted in-depth research on the dilemma of exploration and development, but it has not yet been resolved. For now, we only need to point out that in supervised and unsupervised learning, the whole problem of balancing exploration and development does not arise, at least in the purest form of these paradigms.

​ Another key feature of interaction with the environment is clearly recognized as an overall learning problem. This is different from many methods of considering sub-problems, which do not consider how sub-problems fit into larger situations. For example, we have mentioned that many machine learning researchers have studied supervised learning, but did not specify how this ability is ultimately useful. Other researchers have developed planning theories with general goals, but they have not considered the role of planning in real-time decision-making, nor have they considered the question of where the predictive model required for planning comes from. Although these methods have produced many useful results, their focus on isolated sub-problems is a major limitation.

​ Reinforcement learning adopts the opposite strategy, starting with a complete, interactive, goal-seeking subject. All reinforcement learning subjects have clear goals, can perceive all aspects of the environment, and can choose actions to influence the environment. In addition, it is usually assumed from the beginning that the agent must operate under significant uncertainties in the environment it faces. When reinforcement learning involves planning, it must address the interaction between planning and real-time action selection, and how the environmental model is obtained and improved. When reinforcement learning involves supervised learning, it is based on specific reasons to determine which capabilities are critical and which are not. In order to make progress in learning research, important sub-questions must be isolated and studied, but they should be sub-questions that play a clear role in a complete, interactive, goal-seeking subject, even if all the details of the complete subject cannot be filled in .

​ What we mean by a complete, interactive, goal-seeking agent does not always mean a complete organism or robot. These are all obvious examples, but a complete, interactive, goal-seeking agent can also be an integral part of a larger behavioral system. In this case, the agent directly interacts with other parts of the larger system and indirectly interacts with the environment of the larger system. A simple example is an agent that monitors the charge level of the robot's battery and sends commands to the robot's control architecture. The environment of this agent is the rest of the robot and the environment of the robot. It is important to look beyond the most obvious examples of the subject and its environment to appreciate the universality of the reinforcement learning framework.

​ One of the most exciting aspects of modern reinforcement learning is its substantial and fruitful interaction with other engineering and scientific disciplines. Reinforcement learning is part of a decades-long trend toward greater integration of artificial intelligence and machine learning with statistics, optimization, and other mathematical disciplines. For example, some reinforcement learning methods use the learning ability of parameterized approximators to solve the classic "curse of dimensionality" in operations research and control theory. What's more obvious is that reinforcement learning also has a strong interaction with psychology and neuroscience, both of which have substantial benefits. Among all forms of machine learning, reinforcement learning is the closest to the learning methods of humans and other animals. Many core algorithms of reinforcement learning were originally inspired by biological learning systems. Reinforcement learning has also paid off. On the one hand, it is achieved through a psychological model of animal learning, and on the other hand, it is achieved through an influential model on the brain reward system. This model can better match some empirical data. . The text of this book develops the idea of ​​reinforcement learning, which is related to engineering and artificial intelligence, and the connection with psychology and neuroscience is summarized in chapters 14 and 15.

​ Finally, reinforcement learning is also part of the general trend of artificial intelligence returning to simple general principles. Since the late 1960s, many artificial intelligence researchers believe that no general principles can be discovered. On the contrary, intelligence is due to a large number of special-purpose skills, procedures, and heuristics. Sometimes it is said that if we can input enough relevant facts into a machine, say one million or one billion, then it will become intelligent. Methods based on general principles, such as search or learning, are called "weak methods", while methods based on specific knowledge are called "strong methods." This view is not common today. From our point of view, it is too early: too little research on general principles has led to the conclusion that there are no principles at all. Modern artificial intelligence now includes a lot of research, looking for general principles of learning, searching, and decision-making. It is not clear how far the pendulum will swing back, but reinforcement learning research is undoubtedly part of the development towards simpler and less general artificial intelligence principles.

Guess you like

Origin blog.csdn.net/wangyifan123456zz/article/details/107380938