Reinforcement Learning: An Introduction study notes (2)

Introduction to Reinforcement Learning

Mom, there are ready-made partial translations, reproduced from: https://blog.csdn.net/thousandsofwind/article/details/79710209

1.2 Examples

Thinking about some examples of its development and possible applications is a good way to understand reinforcement learning.

  • The chess master falls. The decision was driven both by his plans—expected replies and reverse replies—as well as his immediate intuition of specific positions and movements.
  • Adaptive controllers adjust parameters of refinery operations in real time. The controller optimizes the yield/cost/quality trade-off based on the specified marginal cost, rather than strictly adhering to the initial values ​​proposed by the engineer.
  • A gazelle struggled to stand a few minutes after birth. After half an hour, it was able to gallop at 20 miles per hour.
  • A housekeeping mobile robot decides whether it should enter a new room, seek out more garbage collection, or go back to a battery charging station. This makes it quick and easy to find past charging locations based on their battery's charge level decisions and whether.
  • Phil prepares breakfast. When you think about it, these mundane activities reveal a complex behavioral network of conditions and goal-subgoal linkages: walk to the cabinet, open it, choose a cereal box, then reach, grab, and put back. More complex behavior sequences that require bowls, spoons, milk cans. Each step involves a series of eye movements to acquire information and guide arrival and movement. People are constantly making judgments about how to carry items, or whether it is better to get those items to the table. Each step is guided by goals, such as grabbing a spoon or going to the refrigerator, and serving other goals, such as having a spoon to eat once the grain is ready and finally nutritious. Plus whether he realizes it or not, Phil is getting information about his physical condition that determines his nutritional needs, hunger levels, and food preferences.

These examples all think the basics are easy to ignore. They both involve the interaction between an active decision-making agent and its environment, where the agent still seeks to achieve a goal in an uncertain environment. The agent's actions can affect the environment (e.g., the next chess position, the water level of the oil refinery's reservoir, the robot's next position, and the future charge level of the battery), which in turn affects the agent's subsequent choices and the environment it faces. The right choice takes into account the indirect, delayed consequences of action and therefore requires foresight or planning.

At the same time, we cannot fully predict the impact of actions in these examples, so agents must continuously monitor their environment and react appropriately. Phil, for example, has to watch the milk he pours into the cereal bowl to prevent it from spilling. All these examples involve goals that are explicit in the sense that the agent can judge progress toward its goal based on what it can sense directly). The chess player knows if he wins, the refinery manager knows how much oil is being produced, the mobile robot knows when the battery runs out, and Phil knows if he's enjoying his breakfast.

In all of these examples, the agent can use its experience to improve its performance over time. The chess player improved the intuition he used to assess his position, thereby improving his chess-playing ability; the gazelle improved the efficiency of his runs; and Phil learned the process of making breakfast. The agent's past experience with related tasks at the start of the task or knowledge brought about by its design and evolution influences what is more useful or easier to learn, but interaction with the environment is more critical to exploit the properties of the task.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324636441&siteId=291194637