It is said that understanding reinforcement learning must first understand Markov properties
Markovian
Knowing its current state (now), its future evolution(Future) does not depend on its past evolution ( past )
The Markov process is divided into three types according to whether its state and time parameters are continuous or discrete:
- The discrete time and state are called Markov chains
- Time and state are continuous called Markov process
- The continuous time and discrete states are called continuous-time Markov chains.
N- step transition probability matrix:
P(n)=P(n-1)P(1)=P(n-2)P(1)P(1)=......=P(1)^n
The probability of passing n steps from one state to other states can be expressed in matrix form, for example:
Hidden Markov Model
The three dice have 4 sides, 6 sides and 8 sides respectively. According to the sequence of 1 to 8 (visible state), the sequence of the used dice (hidden sequence) can be inferred.
- Directly multiply - find the maximum probability of generating a sequence
- Crack the dice sequence, start counting from the first one, find the one with the highest probability, then count the second one, go backwards in turn (this is the forward algorithm), and according to the last state, push out the front one in turn (this is the backward algorithm) — Used to calculate the sum of the probabilities of all possible situations that produced this sequence
- Viterbi algorithm - used to calculate the most likely hidden state sequence that produces the visible state
- Baum-Welch algorithm — too complicated, I didn’t look at it
Reinforcement learning
The following two tutorials are great, first record them, and will make up after your own understanding.
epsilon greed
http://blog.csdn.net/zjq2008wd/article/details/52860654
Q algorithm
http://blog.csdn.net/zjq2008wd/article/details/52767692
Neural networks and reinforcement learning
http://www.cnblogs.com/Leo_wl/p/5852010.html