Playing Atari with Deep Reinforcement Learning论文解读

1.Abstract

We present the first deep learning model to successfully learn control policies directly from high-dimensional sensory input using reinforcement learning. The model is a convolutional neural network, trained with a variant of Q-learning, whose input is raw pixels and whose output is a value function estimating future rewards. We apply our method to seven Atari 2600 games from the Arcade Learning Environment, with no adjustment of the architecture or learning algorithm. We find that it outperforms all previous approaches on six of the games and surpasses a human expert on three of them.
我们提出了第一个深度学习模型,使用强化学习直接从高维感觉输入成功学习控制策略。该model是一个卷积神经网络,使用Q学习的变体进行训练,
其输入是原始像素raw pixels,其输出是估计未来的值函数奖励value function estimating future rewards。 我们将方法应用于Arcade学习环境中的七个Atari 2600游戏,不需要调整架构或学习算法。 我们发现它在六场比赛中超越了之前的所有方法并且超越了三个人类专家。

2.算法思路:

本文证明了卷积神经网络可以克服这些挑战,从复杂RL环境中的原始视频数据中学习成功的控制策略。 使用Q学习[26]算法的变体训练网络,使用随机梯度下降来更新权重。 为了缓解相关数据和非平稳分布的问题,我们使用经验重放机制[13]随机抽样先前的过渡,从而平滑过去许多行为的训练分布。

3.实现目标:

Our goal is to create a single neural network agent that is able to successfully learn to play as many of the games as possible.
(it learned from nothing but the video input, the reward and terminal
signals, and the set of possible actions—just as a human player would.)

Our goal is to connect a reinforcement learning algorithm to a deep neural network which operates directly on RGB images and efficiently process training data by using stochastic gradient updates.

猜你喜欢

转载自blog.csdn.net/weixin_41913844/article/details/84061899