Introduction to Reinforcement Learning

Learning from interaction is a foundational idea underlying nearly all theories of learning and intelligence.The approach we explore, called reinforcement learning, is much more focused on goal-directed learning form interaction than are other approaches to machine learning.
这里写图片描述
Reinforcement learning is learning what to do—how to map situations to actions—so as to maximize a numerical reward signal.
These two characteristics—trial-and-error search and delayed reward—are the two most important distinguishing features of reinforcement learning.

One of the challenges that arise in reinforcement learning, and not in other kinds of learning, is the trade-off between exploration and exploitation.
Another key feather of reforcement learning is that it explicitly considers the whole problem of a goal-directed agent interacting with an uncertain environment .

One must look beyond the most obvious examples of agents and their environments to appreciate the generality of the reinforcement learning framework.

Feathers shared by cases that can use reinforcement learning:
All involve interaction between an active decision-making agent and its environment, within which the agent seeks to achieve a goal despite uncertainty about its environment.

Elements of reforcement learning:

  • agent
  • policy
    A policy is a mapping from perceived states of the environment to actions to be taken when in those states.
  • reward signal
    A reward signal defines the goal in a reinforcement learning problem.
  • value function
    Whereas rewards determine the immediate, intrinsic desirablity of environmental states, values indicate the long-term desirability of states after taking into account the states that are likely to follow, and the rewards available in those states.
  • model of the environment (optional)
    Methods for solving reinforcement learning problems that use models and planning are called model-based methods, as opposed to simpler model-free methods that are explicitly trial-and-error learners—viewed as almost the opposite of planning.

Reinforcement learning uses the formal framework of Markov decision processes to define the interaction between a learning agent and its environment in terms of states, actions and rewards.

猜你喜欢

转载自blog.csdn.net/weixin_42018112/article/details/80456762