Although this is Bandit Algorithms
the term note I'm looking at, these terms still apply to reinforcement learning.
Reward
A quantitative measure of success. In the business field, the ultimate return is profit, but we can usually treat simpler indicators, such as ad click rate or new user registration rate, as return. The important thing is that ( A
) has a clear quantitative scale ( B
).
Arm
What does it mean that we have choices? What actions can we take?
Bandit
Bandit
Yes Arm
collection. When you have many choices, we call these choices Multiarmed Bandit
. " Multiarmed Bandit
" Is a mathematical model that you can use to infer how you make decisions when you have many actions to take, and incomplete information about the rewards you will receive after taking these actions.
Play/Trial
When you are dealing with a bandit
problem, you usually have pull arm
many times, and when you are each pull
time, we call it play
.
Horizon
Horizon
It indicates trials
how many times you can play before the game ends ( The number of trials left is called the horizon.
). Horizon
The length used is strategy
usually different. Because you can use more radical schemes to increase exploration.
Exploitation
An Multiarmed Bandit
algorithm to solve the problem, more considering play
the situation before .
Exploration
An Multiarmed Bandit
algorithm to solve problems, more consideration of exploring unprecedented play
situations.
Explore/Exploit Dilemma
Any learning system must make a compromise between exploration and use. There is no definitive solution to this problem, but the algorithms described in this book provide useful strategies for solving conflicting goals of exploration and development.
Annealing
An algorithm for solving the Multiarmed Bandit Problem anneals(退火) if it explores less over time.
Temperature
Controlled exploration
parameters.
Streaming Algorithms
An algorithm is a streaming algorithm ( streaming algorithm
), which processes data one block at a time. This is in contrast to the batch algorithm ( batch processing algorithms
), which requires access to all data to process it.
Online Learning
Online Learning
The algorithm can not only process one piece of data at a time, but also provide temporary results of its analysis after seeing each piece of data.
Active Learning
An algorithm is an active learning algorithm, if it can decide the next piece of data it wants to see in order to learn most effectively. Most traditional machine learning algorithms are not active: they passively accept the data we provide to them without telling us what data we should collect next.
Bernoulli
A Bernoulli system outputs a 1 with probability and a 0 with probability .
My WeChat public account name : Deep learning advanced intelligent decision
- making WeChat
public account ID : MultiAgent1024 public account introduction : mainly research deep learning, reinforcement learning, machine games and other related content! Looking forward to your attention, welcome to learn and exchange progress together!