DQN (Deep Q Network) Introduction

1 Introduction

Deep Q Network belong to reinforcement learning. This article is "with Deep Reinforcement Learning Playing Atari" translated summary from.

This model includes a convolutional neural network, Q-learning. Enter a pixel map, the output value is a function of the evaluation of future earnings. Using a stochastic gradient descent method.

From the perspective of considering the depth of learning, reinforcement learning faces three challenges, one needs a lot of training manual identification data; the second is most depth study assumes that the data sample is independence, and reinforcement learning is faced with a high correlation processing status ; fixed depth study assumes that the data distribution, data and enhanced learning in the distribution algorithm to learn new behaviors will change.

This model attempts to solve these problems. Convolution can do a lot of artificial neural networks to identify the training data. To reduce the problem of data relating to the distribution and unsatisfactory, use a mechanism to repeat the experience, it randomly sampled from the previous conversion, so based on a number of past behavior to smooth the distribution of training.

2. Algorithms

Here Insert Picture Description

3. Effect

On the game breakout, enduro, pong, this model is better than humans, in beam rider game performance approaching humans. But in the Q * bert, Seaquest, Space Invaders game, far less than human, mainly because these games require the network to find a strategy that can be extended very long time.
Here Insert Picture Description
HNeat Best refers to the use of the object detection algorithm is designed to hand, and the output of the position on the object category Atari screen.

4. Other TD-gammon

This paper found that TD-gammon only performed better in backgammon (backgammon), but not in other games, perhaps because of the randomness of the dice cast to help explore the state space, so that the cost function is particularly smooth.

Timing difference (TD) is a Learning prediction-based machine learning method. It is mainly used for reinforcement learning problems, known as "the Monte Carlo ideas and dynamic programming (DP) thinking." TD is similar to Monte Carlo method, because it is to learn some strategies on the environment by sampling; it is associated with dynamic programming technique because it approximate estimate of the current state estimate based on prior learning (process of self-help law). Time difference model TD learning algorithms and learning about animals.

By 1992, Gerald Tesauro wrote TD-Gammon, the program uses artificial neural networks as a model, and the use of TD-lambda algorithm training. Through a lot of self-Game, TD-Gammon reached the top level of humanity, and it is this is not the way human players to participate in training, making TD-Gammon's way different from human chess players. TD-Gammon significance lies not only in the use of reinforcement learning training, it is proved that the project does not require any of the features, simple to use location as a piece of neural network input can also be trained to reach the top levels of the human player agent

Here Insert Picture Description

Published 21 original articles · won praise 18 · views 1449

Guess you like

Origin blog.csdn.net/zephyr_wang/article/details/105020325