serial number | article | Key words | roughly means |
---|---|---|---|
61 | Reinforcement Learning with General Utilities: Simpler Variance Reduction and Large State-Action Space | General Utilities, PG | Introduced a gradient descent method for General Utilities, that is, the objective function is a nonlinear function of state-action pair distribution |
62 | Trajectory-Aware Eligibility Traces for Off-Policy Reinforcement Learning | off-policy, Important sampling | An importance sampling algorithm RBIS for offline learning is introduced, which expresses trace as a function of t instead of the product of IS ratio |
63 | Semi-Offline Reinforcement Learning for Optimized Text Generation | semi-offline, LLM | Introduces a semi-offline reinforcement learning method for training language models, that is, using the training data of the model, only one step of model reasoning is required |
64 | StriderNet: A Graph Reinforcement Learning Approach to Optimize Atomic Structures on Rough Energy Landscapes | GNN, Atomic structs, RL applications | The atomic structure is optimized using reinforcement learning, and a GNN is used to extract the characteristics of the atomic structure |
65 | Reinforcement Learning Can Be More Efficient with Multiple Rewards | - | |
66 | LESSON: Learning to Integrate Exploration Strategies for Reinforcement Learning via an Option Framework | - | |
67 | Flipping Coins to Estimate Pseudocounts for Exploration in Reinforcement Learning | Exploration | Use a network to estimate whether each step in the trajectory visits the state, and calculate a corresponding bonus obtained by exploration |
68 | Interactive Object Placement with Reinforcement Learning | - | |
69 | Oracles and Followers: Stackelberg Equilibria in Deep Multi-Agent Reinforcement Learning | Stackelberg Equilibria, game theory, RL applications | Introduces a framework for implementing Stackelberg equilibrium problems with multiple agents |
70 | Non-stationary Reinforcement Learning under General Function Approximation | Non-stationary | Introduces SW-OPEA, a reinforcement learning algorithm for unstable environments, using sliding window and confidence-based conditions when screening policy set functions |
71 | Multi-task Hierarchical Adversarial Inverse Reinforcement Learning | IL, IRL,Muti-task | A layered imitation learning algorithm MH-AIRL is introduced, which improves the AIRL algorithm and can be used in multi-task |
72 | Local Optimization Achieves Global Optimality in Multi-Agent Reinforcement Learning | Multi-Agent | A PPO algorithm for multi-agent tasks is proposed. The agents are updated in a certain order, and the action of the previous agent is used as the condition when updating. |
73 | Entity Divider with Language Grounding in Multi-Agent Reinforcement Learning | Muti-Agent, LLM | Introduces a multi-agent reinforcement learning framework, EnDi, which avoids sub-goal conflicts and improves generalization by dividing the entities that agents need to interact with. |
74 | Parallel QQQ-Learning: Scaling Off-policy Reinforcement Learning under Massively Parallel Simulation | Parallel | Introduced a parallel reinforcement learning algorithm PQL, based on DDQN extension, parallel update Q function and strategy. |
75 | Simple Embodied Language Learning as a Byproduct of Meta-Reinforcement Learning | - | |
76 | Language Instructed Reinforcement Learning for Human-AI Coordination | LLM | Introduced a reinforcement learning framework instructRL, which uses human instructions to correct the Q function, improves Q-learning and PPO, and improves the human-machine coordination ability |
77 | Representation-Driven Reinforcement Learning | Exploration | Introduces ReRL, a reinforcement learning framework, which transforms the exploration problem into a representation problem by representing the parameter of the policy as a value for exploration |
78 | Efficient Online Reinforcement Learning with Offline Data | offline | A new framework RLPD is introduced, which utilizes offline data by mixing offline data and online data, adding LayerNorm and other methods |
79 | Reinforcement Learning with History Dependent Dynamic Contexts | Non-stationary | 介绍了一种动态的马尔科夫决策过程DCMDP,采用特征映射来获取历史向量,以及一种采用最大似然法求解特征映射的方法LDC-UCB,以及一种基于模型的方法DCZero |
80 | Improved Regret for Efficient Online Reinforcement Learning with Linear Function Approximation | Exploration, adversarial cost | 介绍了一种基于最小二乘法的强化学习算法PO-LSBE,用于鼓励在可变损失环境中的进行探索。 |
ICML Reinforcement Learning Article Classification
Guess you like
Origin blog.csdn.net/HGGshiwo/article/details/131149707
Recommended
Ranking