The difference between RL, MAB and Contextual Bandits

Contextual Bandits are between RL and MAB.

  • RL: Action changes state, reward is determined by state, action
  • CB: Action does not change state, reward is determined by state, action
  • MAB: Action does not change the state, the reward is only determined by the action

linUCB is a method of Contextual Bandits. The basic idea is to use a function to approximate the expected return. For each action, learn such an estimation function. When faced with a new state s, first estimate the expected return of each action, and then select an action to do according to the UCB algorithm (combined). Consider exploration and greed).

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325901024&siteId=291194637
RL
Recommended