Upper-Confidence-Bound(UCB) Action Selection - 代码天地

Upper-Confidence-Bound(UCB) Action Selection

其他 2021-12-10 20:47:17 阅读次数: 0

Background

In ε-greedy method, we randomly choose non-greedy actions as exploration, but indiscriminately, with no preference for those that are nearly greedy or particularly uncertain.

Upper-Confidence-Bound

In order to take into account both how close their estimates are to being maximal and the uncertainties in those estimates, one effective way is to select actions according to: $A_t\doteq \underset{a}{argmax}[Q_t(a)+c\sqrt{\frac{\ln{t}}{N_t(a)}}]$

$N_t(a)$ denotes the number of times that action $a$ has been selected prior to time $t$ . If $N_t(a)=0$ , then $a$ is considered to be a maximizing action.
$c > 0$ controls the degree of exploration and determines the confidence level.
The use of natural logarithm $ln{t}$ means that the increases get smaller over time, but are unbounded - all actions will be selected eventually. But actions with lower value estimates or that have already been selected frequently, will be selected with decreasing frequency over time.

The idea of UCB action selection is that the square-root term $c\sqrt{\frac{\ln{t}}{N_t(a)}}$ is a measure of the uncertainty or variance in the estimate of a’s value. The quantity being max’ed over is a sort of upper bound on the possible true value of action $a$ . Each time the action $a$ is selected, the uncertainty is reduced. On the other hand, as the time step $t$ goes larger, if the action other than $a$ is selected, the uncertainty is increased.

Pros & Cons

UCB is more difficult than ε-greedy method to extend beyond bandit problems.
UCB has difficulties in dealing with large state spaces and nonstationary problems…

猜你喜欢

转载自blog.csdn.net/lun55423/article/details/111997844

Upper-Confidence-Bound(UCB) Action Selection

上置信界算法（the-upper-confidence-bound-algorithm，UCB）

The Epsilon-Greedy /UCB ("upper confidence bound") for MAB (Multiarmed-bandit) problem sometime in reinforcement learning (RL)

Selection

动作识别论文20191104_Probabilistic selection of frames for early action recognition in videos

Action

Action！

Triplet Selection

Model selection

Interval Selection

Selection Sort

Feature Selection

Type Selection

beam Selection

selection for loops

JavaScript——Selection

Selection Sorting

【论文阅读笔记】基于分类器预测置信度的集成选择| Ensemble Selection based on Classifier Prediction Confidence

HFSS报错A geometry selection is required for selection

There is no Action mapped for action name *Action

选择排序 (Selection Sort)

Range对象与Selection对象

Failed to pull selection

Feature selection using SelectFromModel

css3:selection

About language selection

UESTC Training For Summer Selection A

HDU 5530：Pipes Selection

OpenCASCade——presentation与selection模块

myeclipse selection job title

今日推荐

手把手教你用 LangChain 实现大模型 Agent

外星人入侵（python）

超全的免费chatGPT列表【建议收藏】

52.2k star! 自己部署gpt4free, 免费使用各种GPT

2024年（第十届）全国大学生统计建模大赛优秀论文解析——中国经济发展与碳排放库兹涅茨曲线的验证研究

【自动驾驶技术】自动驾驶汽车AI芯片汇总——NVIDIA篇

7个免费的ChatGPT网站，给大家送上

Angular v18 正式发布！

【VMware】 vCenter Converter standalone 6.6.0正式版下载

开源日报 | Angular v18；大模型价格战下的推理优化；Mistral AI以开源模型瞄准美国市场；硅谷有自己的鲁迅

数学建模Matlab之数据预处理方法

充电桩---ISO15118协议详细介绍

周排行

慧测学习课件

Mscordacwks.dll/SOS.dll 调试归档

关于深度学习人工智能模型的探讨（二）（7）

Stop Using the text-indent:-9999px

Least Common Multiple（HDU - 1019 ）

Comparator接口的使用方法--例子

修改framework Camera的API,旋转摄像头

机器学习时代的“大数据+”：数据平台的设计与搭建

vue 项目部署到nginx

webstorm 常用插件集合

每日归档

更多

2024-05-29(65)

2024-05-28(2)

2024-05-27(56)

2024-05-26(6)

2024-05-25(68)

2024-05-24(65)

2024-05-23(9)

2024-05-22(41)

2024-05-21(8)

2024-05-20(36)