Preface to the Dobby Gaming Machine: Introduction to Terminology

  Although this is Bandit Algorithmsthe term note I'm looking at, these terms still apply to reinforcement learning.

Reward

  A quantitative measure of success. In the business field, the ultimate return is profit, but we can usually treat simpler indicators, such as ad click rate or new user registration rate, as return. The important thing is that ( A) has a clear quantitative scale ( B).

Arm

  What does it mean that we have choices? What actions can we take?

Bandit

  BanditYes Armcollection. When you have many choices, we call these choices Multiarmed Bandit. " Multiarmed Bandit" Is a mathematical model that you can use to infer how you make decisions when you have many actions to take, and incomplete information about the rewards you will receive after taking these actions.

Play/Trial

  When you are dealing with a banditproblem, you usually have pull armmany times, and when you are each pulltime, we call it play.

Horizon

  HorizonIt indicates trialshow many times you can play before the game ends ( The number of trials left is called the horizon.). HorizonThe length used is strategyusually different. Because you can use more radical schemes to increase exploration.

Exploitation

  An Multiarmed Banditalgorithm to solve the problem, more considering playthe situation before .

Exploration

  An Multiarmed Banditalgorithm to solve problems, more consideration of exploring unprecedented playsituations.

Explore/Exploit Dilemma

  Any learning system must make a compromise between exploration and use. There is no definitive solution to this problem, but the algorithms described in this book provide useful strategies for solving conflicting goals of exploration and development.

Annealing

  An algorithm for solving the Multiarmed Bandit Problem anneals(退火) if it explores less over time.

Temperature

  Controlled explorationparameters.

Streaming Algorithms

  An algorithm is a streaming algorithm ( streaming algorithm), which processes data one block at a time. This is in contrast to the batch algorithm ( batch processing algorithms), which requires access to all data to process it.

Online Learning

  Online LearningThe algorithm can not only process one piece of data at a time, but also provide temporary results of its analysis after seeing each piece of data.

Active Learning

  An algorithm is an active learning algorithm, if it can decide the next piece of data it wants to see in order to learn most effectively. Most traditional machine learning algorithms are not active: they passively accept the data we provide to them without telling us what data we should collect next.

Bernoulli

  A Bernoulli system outputs a 1 with probability p p and a 0 with probability 1 p 1 – p .

My WeChat public account name : Deep learning advanced intelligent decision
- making WeChat
public account ID : MultiAgent1024 public account introduction : mainly research deep learning, reinforcement learning, machine games and other related content! Looking forward to your attention, welcome to learn and exchange progress together!

发布了185 篇原创文章 · 获赞 168 · 访问量 21万+

Guess you like

Origin blog.csdn.net/weixin_39059031/article/details/104890468