CarRacing DQN: Deep Q-learning for training self-driving cars

introduction

In the field of reinforcement learning, training a CarRacing 2D agent capable of autonomous driving is a fascinating challenge. In this blog, we'll dive into the code for Deep Q Learning (DQN) implemented using TensorFlow and Keras to train a model capable of navigating CarRacing's virtual race track.
Insert image description here

DQN algorithm principle

Q-values ​​and the Bellman equation

The Q-value (the expected cumulative reward for a state-action pair) is defined by the Bellman equation:
[ Q(s,a) = r(s,a) + \gamma \max Q(s', A) ]

  • (s) is the current state
  • (a) is the action taken
  • (r(s,a)) is the reward after taking action (a) in state (s)
  • (s') is the next state
  • (A) is the action space
  • (\gamma) is the discount rate used to measure the importance of future rewards

DQN structure

DQN combines Q learning with deep learning, replacing the Q table with a neural network. The structure of the model is as follows:

model = Sequential()
model.add(Conv2D(filters=6, kernel_size=(7, 7), strides=3, activation='relu', input_shape=(96, 96, self.frame_stack_num)))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(filters=12, kernel_size=(4, 4), activation='relu'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(216, activation='relu'))
model.add(Dense(len(self.action_space), activation=None))
model.compile(loss='mean_squared_error', optimizer=Adam(lr=self.learning_rate, epsilon=1e-7))
  • The input is three consecutive top view images, each 96x96 pixels
  • Convolutional layers and max pooling layers are used to capture image features
  • The fully connected layer outputs the Q value of each action

Training process design

Experience Replay

In order to break the temporal correlation between data, experience replay is used to store previous experiences in the experience pool and randomly sample from them for training.

def memorize(self, state, action, reward, next_state, done):
    self.memory.append((state, self.action_space.index(action), reward, next_state, done))

Target Network

Introduce the target network to slow down the change of the target and improve the stability of training.

def update_target_model(self):
    self.target_model.set_weights(self.model.get_weights())

training loop

def replay(self, batch_size):
    minibatch = random.sample(self.memory, batch_size)
    train_state = []
    train_target = []
    for state, action_index, reward, next_state, done in minibatch:
        target = self.model.predict(np.expand_dims(state, axis=0))[0]
        if done:
            target[action_index] = reward
        else:
            t = self.target_model.predict(np.expand_dims(next_state, axis=0))[0]
            target[action_index] = reward + self.gamma * np.amax(t)
        train_state.append(state)
        train_target.append(target)
    self.model.fit(np.array(train_state), np.array(train_target), epochs=1, verbose=0)

In each training cycle, a batch of data is randomly selected from the experience pool, the target Q value is calculated, and the model weights are updated.

Training results and model evolution

Through training, we observe that the model gradually learns to navigate the track:

After 400 rounds of training

The model encountered difficulties in making sharp turns during learning and occasionally deviated from the track.

After 500 rounds of training

The model becomes more proficient, making fewer errors and driving smoother.

After 600 rounds of training

The model became reckless in its greed for rewards, causing it to leave the track during sharp turns.

Summarize

This blog provides an in-depth analysis of the process of training self-driving agents using the DQN algorithm. Through experience replay and the application of target networks, the model gradually learns to optimize the Q value to achieve better navigation strategies. Deep Q-learning provides a powerful and flexible method for solving decision-making problems in complex environments, and provides new ideas for research and applications in the field of autonomous driving.

Guess you like

Origin blog.csdn.net/qq_36315683/article/details/135417404