Article Directory
一、deep reinforcement learning
Reinforcement Learning Scenario
Between Supervised Learning and Reinforcement Learning:
Training a Chatbot - Reinforcement Learning: Letting two agents talk to each other (sometimes producing good dialogues, sometimes producing bad ones)
With this approach we can generate a lot of dialogues. Use some pre-defined rules to evaluate how good a conversation is
Interactive Retrieval
Reward Delay: In Space Invaders, only "fire" is rewarded, although movement before "fire" is important. In the game of Go, it may be better to sacrifice immediate rewards for more long-term rewards. The behavior of the agent affects the subsequent data it receives.
Deep reinforcement learning can be divided into: Policy-based and Value-based.
Policy-based Approach——Learning an Actor
Neural Networks as Actors
The input of the neural network: the observed value of the machine represented by a vector or matrix. The
output neural network: each action corresponds to a neuron in the output layer.
In fact, the action is random and not fixed. It doesn't mean that the big number means this action.
What are the benefits of using a network instead of a lookup table? generalization! Things you haven't seen get good results too.
It should be noted that this is to multiply all the rewards of this τ, not to multiply an r. Because we want actions to be the same as many results.
Dividing by a p is better, and it is standardized. I don't want to be more biased and have a low score.
The probability of unsampled actions will be reduced, so how to solve it? Just subtract a bias to make the probability positive and negative.
The critic is a function dependent on the actor being rated, which is represented by a neural network. State-value function Vπ(s): When using actor π, the expectation is to obtain a cumulative reward after seeing the observation (state) s.
small model
model compression
Networks can be pruned
Networks are often overparameterized (with lots of redundant weights or neurons), we can prune them!
Importance of weights: absolute value, lifetime...
Importance of a neuron: number of times it is non-zero on a given dataset...
After pruning, accuracy drops (hopefully not too much)
for recover Fine-tune the training data, don't prune too much at once, or the network won't recover.
In fact, irregular networks are not easy to use GPU to accelerate, and it is not necessarily accelerated.
After removing neurons, the network is regular, so it can be accelerated.
How about simply training a smaller network? It is well known that smaller networks are harder to learn successfully. . Are larger networks easier to optimize?
As long as the sub-network is trained well, the large one can be trained well.
The small network cannot be trained directly, and it can be trained after pruning the large network.