ICML强化学习文章分类

序号 文章 关键词 大概意思
61 Reinforcement Learning with General Utilities: Simpler Variance Reduction and Large State-Action Space General Utilities, PG 介绍了一种梯度下降方法,用于General Utilities,就是目标函数是state-action pair distribution的非线性函数
62 Trajectory-Aware Eligibility Traces for Off-Policy Reinforcement Learning off-policy, Important sampling 介绍了一种用于离线学习的重要性采样算法RBIS,将trace表示为了一个t的函数,而不是IS ratio的乘积
63 Semi-Offline Reinforcement Learning for Optimized Text Generation semi-offline, LLM 介绍了一种训练语言模型的半离线强化学习方法,即利用模型的训练数据,只需要模型推理一步
64 StriderNet: A Graph Reinforcement Learning Approach to Optimize Atomic Structures on Rough Energy Landscapes GNN, Atomic structes, RL应用 使用强化学习对原子结构进行优化,使用了一个GNN提取原子结构的特征
65 Reinforcement Learning Can Be More Efficient with Multiple Rewards -
66 LESSON: Learning to Integrate Exploration Strategies for Reinforcement Learning via an Option Framework -
67 Flipping Coins to Estimate Pseudocounts for Exploration in Reinforcement Learning Exploration 使用一个网络估计trajectory中的每一步是否访问该状态,并计算出一个对应的探索获得的bonus
68 Interactive Object Placement with Reinforcement Learning -
69 Oracles and Followers: Stackelberg Equilibria in Deep Multi-Agent Reinforcement Learning Stackelberg Equilibria, 博弈论, RL应用 介绍了一种框架,用来实现多个代理的Stackelberg均衡问题
70 Non-stationary Reinforcement Learning under General Function Approximation Non-stationary 介绍了一种用于不稳定环境的强化学习算法SW-OPEA,在筛选策略集函数时使用了基于滑动窗口和置信度的条件
71 Multi-task Hierarchical Adversarial Inverse Reinforcement Learning IL, IRL,Muti-task 介绍了一种分层的模仿学习算法MH-AIRL,对AIRL算法进行了改进,可以用于多任务中
72 Local Optimization Achieves Global Optimality in Multi-Agent Reinforcement Learning Multi-Agent 提出了一种用于多代理任务的PPO算法,按照一定顺序对代理进行更新,在更新时之前代理的动作作为条件。
73 Entity Divider with Language Grounding in Multi-Agent Reinforcement Learning Muti-Agent, LLM 介绍了一个多代理强化学习框架EnDi,通过对代理需要交互的实体进行划分,避免子目标冲突,提高了泛化能力
74 Parallel Q Q Q-Learning: Scaling Off-policy Reinforcement Learning under Massively Parallel Simulation Parallel 介绍了一种并行的强化学习算法PQL,基于DDQN扩展,并行的更新Q函数和策略。
75 Simple Embodied Language Learning as a Byproduct of Meta-Reinforcement Learning -
76 Language Instructed Reinforcement Learning for Human-AI Coordination LLM 介绍了一个强化学习框架instructRL,使用人类的指令对Q函数进行修正,改进了Q-learning和PPO,提升了人机协调能力
77 Representation-Driven Reinforcement Learning Exploration 介绍了一种强化学习框架ReRL,通过把策略的参数表示为一个用于探索的数值,将探索问题转为表示问题
78 Efficient Online Reinforcement Learning with Offline Data offline 介绍了一种新的框架RLPD,通过混合离线数据和在线数据,添加LayerNorm等方法对离线数据进行利用
79 Reinforcement Learning with History Dependent Dynamic Contexts Non-stationary 介绍了一种动态的马尔科夫决策过程DCMDP,采用特征映射来获取历史向量,以及一种采用最大似然法求解特征映射的方法LDC-UCB,以及一种基于模型的方法DCZero
80 Improved Regret for Efficient Online Reinforcement Learning with Linear Function Approximation Exploration, adversarial cost 介绍了一种基于最小二乘法的强化学习算法PO-LSBE,用于鼓励在可变损失环境中的进行探索。

猜你喜欢

转载自blog.csdn.net/HGGshiwo/article/details/131149707