Reinforcement learning-an introduction Reinforcement learning translation section 1.2

1.2 Examples

​ A good way to understand reinforcement learning is to consider some examples and possible applications that guide the development of reinforcement learning.

​ (1) The chess master moves. The choice is through planning-anticipating possible answers and counter-answers, as well as immediate and intuitive judgments of the desirability of specific positions and actions.

​ (2) The adaptive controller can adjust the operating parameters of the refinery in real time. The controller optimizes the benefit/cost/quality trade-offs based on the specified marginal cost without having to strictly adhere to the set points originally suggested by the engineer.

​ (3) The small antelope struggled to stand up a few minutes after birth. After half an hour, it ran at 20 miles per hour.

​ (4) A mobile robot decides whether it should enter a new room to collect more garbage, or start looking for a way back to the battery charging station. It makes decisions based on the current charge level of the battery and the speed and ease of finding a charger in the past.

​ (5) Phil prepares breakfast. Careful observation, even this seemingly ordinary activity reveals a complex conditional behavior and a network of interrelated goals-subgoal relationships: walk to the cabinet, open the cabinet, choose a cereal box, and reach out Take, grab, and retrieve the box. Other complex, adjusted, and interactive behavior sequences require obtaining a bowl, spoon and milk carton. Each step involves a series of eye movements to obtain information and guide arrival and movement. People are constantly judging how to move these items, or whether it is better to transport some of them to the table before obtaining other items. Each step has a goal guide, such as grab a spoon or go to the refrigerator, but also serve other goals, such as once the cereal is ready, you can use a spoon to eat, and finally get nourishment. Regardless of whether he realizes this or not, Phil is getting information about his physical condition, which determines his nutritional needs, hunger level and food preferences.

​ These examples share some basic features that are easily overlooked. All of these involve the interaction between an active decision-making agent and its environment, in which the agent seeks to achieve a goal despite the uncertain environment. The behavior of the agent is allowed to affect the future state of the environment (e.g., the position where chess is played, the level of the oil refinery in the reservoir, the next position of the robot in the future and the rechargeable battery), so an action of opportunity is provided at a later time and opportunity To the agent. The right choice needs to consider the indirect and delayed consequences of the action, so it may require foresight or planning.

​ At the same time, in all these examples, the effects of actions cannot be fully predicted; therefore, agents must constantly monitor their environment and respond appropriately. For example, Phil must be careful to pour milk into a cereal bowl to prevent it from spilling. All of these examples involve clear goals, and in a sense, the agent can judge progress toward the goal based on what it can directly feel. The chess player knows whether he wins or not, the refinery controller knows the oil production, the small antelope knows when it falls, the mobile robot knows when its battery runs out, and Phil knows whether he is having breakfast.

​ In all these examples, the agent can use its experience to improve its performance over time. The chess player improved his intuition for assessing position, thereby improving his chess game; the small antelope improved its efficiency in running; Phil learned how to make breakfast without running water. The knowledge that the agent has built into the task from previous experience of the task at the beginning of the task or through design or evolution can affect useful or easy-to-learn content, but interaction with the environment is crucial for adjusting behavior to take advantage of the specific characteristics of the task important.

Guess you like

Origin blog.csdn.net/wangyifan123456zz/article/details/107380976