ICML Reinforcement Learning Article Classification

serial number article Key words roughly means
61 Reinforcement Learning with General Utilities: Simpler Variance Reduction and Large State-Action Space General Utilities, PG Introduced a gradient descent method for General Utilities, that is, the objective function is a nonlinear function of state-action pair distribution
62 Trajectory-Aware Eligibility Traces for Off-Policy Reinforcement Learning off-policy, Important sampling An importance sampling algorithm RBIS for offline learning is introduced, which expresses trace as a function of t instead of the product of IS ratio
63 Semi-Offline Reinforcement Learning for Optimized Text Generation semi-offline, LLM Introduces a semi-offline reinforcement learning method for training language models, that is, using the training data of the model, only one step of model reasoning is required
64 StriderNet: A Graph Reinforcement Learning Approach to Optimize Atomic Structures on Rough Energy Landscapes GNN, Atomic structs, RL applications The atomic structure is optimized using reinforcement learning, and a GNN is used to extract the characteristics of the atomic structure
65 Reinforcement Learning Can Be More Efficient with Multiple Rewards -
66 LESSON: Learning to Integrate Exploration Strategies for Reinforcement Learning via an Option Framework -
67 Flipping Coins to Estimate Pseudocounts for Exploration in Reinforcement Learning Exploration Use a network to estimate whether each step in the trajectory visits the state, and calculate a corresponding bonus obtained by exploration
68 Interactive Object Placement with Reinforcement Learning -
69 Oracles and Followers: Stackelberg Equilibria in Deep Multi-Agent Reinforcement Learning Stackelberg Equilibria, game theory, RL applications Introduces a framework for implementing Stackelberg equilibrium problems with multiple agents
70 Non-stationary Reinforcement Learning under General Function Approximation Non-stationary Introduces SW-OPEA, a reinforcement learning algorithm for unstable environments, using sliding window and confidence-based conditions when screening policy set functions
71 Multi-task Hierarchical Adversarial Inverse Reinforcement Learning IL, IRL,Muti-task A layered imitation learning algorithm MH-AIRL is introduced, which improves the AIRL algorithm and can be used in multi-task
72 Local Optimization Achieves Global Optimality in Multi-Agent Reinforcement Learning Multi-Agent A PPO algorithm for multi-agent tasks is proposed. The agents are updated in a certain order, and the action of the previous agent is used as the condition when updating.
73 Entity Divider with Language Grounding in Multi-Agent Reinforcement Learning Muti-Agent, LLM Introduces a multi-agent reinforcement learning framework, EnDi, which avoids sub-goal conflicts and improves generalization by dividing the entities that agents need to interact with.
74 Parallel QQQ-Learning: Scaling Off-policy Reinforcement Learning under Massively Parallel Simulation Parallel Introduced a parallel reinforcement learning algorithm PQL, based on DDQN extension, parallel update Q function and strategy.
75 Simple Embodied Language Learning as a Byproduct of Meta-Reinforcement Learning -
76 Language Instructed Reinforcement Learning for Human-AI Coordination LLM Introduced a reinforcement learning framework instructRL, which uses human instructions to correct the Q function, improves Q-learning and PPO, and improves the human-machine coordination ability
77 Representation-Driven Reinforcement Learning Exploration Introduces ReRL, a reinforcement learning framework, which transforms the exploration problem into a representation problem by representing the parameter of the policy as a value for exploration
78 Efficient Online Reinforcement Learning with Offline Data offline A new framework RLPD is introduced, which utilizes offline data by mixing offline data and online data, adding LayerNorm and other methods
79 Reinforcement Learning with History Dependent Dynamic Contexts Non-stationary 介绍了一种动态的马尔科夫决策过程DCMDP,采用特征映射来获取历史向量,以及一种采用最大似然法求解特征映射的方法LDC-UCB,以及一种基于模型的方法DCZero
80 Improved Regret for Efficient Online Reinforcement Learning with Linear Function Approximation Exploration, adversarial cost 介绍了一种基于最小二乘法的强化学习算法PO-LSBE,用于鼓励在可变损失环境中的进行探索。

Guess you like

Origin blog.csdn.net/HGGshiwo/article/details/131149707