Briefly describe the Markov chain [easy to understand]

Markov chain

foreword

Markov Chain can be said to be the cornerstone of machine learning and artificial intelligence, and has extremely wide applications in reinforcement learning, natural language processing, financial fields, weather forecasting, and speech recognition

The future is independent of the past given the present.
The future is independent of the past and based only on the present.

This sentence of life philosophy also represents the thought of Markov chain: all the information in the past has been saved to the current state, and the future can be predicted based on the present.

Although it may be extreme to say so, it can greatly simplify the complexity of the model, so Markov chains are widely used in many time series models, such as recurrent neural network RNN, hidden Markov model HMM, etc., of course MCMC It is also required.

stochastic process

Markov chain is a part of the course of stochastic process , let's take a brief look at it first.

To put it simply, the stochastic process is the process of using statistical models to predict and process some things . For example, the stock price prediction is based on the rise and fall of stocks today, but it predicts the rise and fall of stocks tomorrow and the day after tomorrow; rain. These processes can be quantified and calculated through mathematical formulas. Through the probability of rain and stock rise and fall, the situation in N days can be deduced by using the formula.

Markov chain

Introduction

Russian mathematician Andrey Andreyevich Markov researched and proposed a general law model that can explain natural changes with mathematical methods, named Markov Chain (Markov Chain). The Markov chain is a random process of transition from one state to another in the state space . This process requires " no memory ", that is, the probability distribution of the next state can only be determined by the current state. In the time series None of the events preceding it are relevant to it. This particular type of " memorylessness " is called the Markov property.

Andrey Markov

Markov chains believe that all information in the past is stored in the current state . For example, for such a sequence of numbers 1 - 2 - 3 - 4 - 5 - 6, in the view of the Markov chain, the state of 6 is only related to 5, and has nothing to do with other previous processes.

mathematical definition

Then suppose our sequence state is . . . . X t − 2 , X t − 1 , X t , X t + 1 . . . ....X_{t-2},X_{t-1},X_ {t},X_{t+1}.......Xt2,Xt1,Xt,Xt+1. . . , then atX t + 1 X_{t+1}Xt+1The conditional probability of the state at the moment depends only on the state X t at the previous moment X_{t}Xt,Right now:

P ( X t + 1 ∣ … X t − 2 , X t − 1 , X t ) = P ( X t + 1 ∣ X t ) P\left(X_{t+1} \mid \ldots X_{t-2}, X_{t-1}, X_{t}\right)=P\left(X_{t+1} \mid X_{t}\right) P(Xt+1Xt2,Xt1,Xt)=P(Xt+1Xt)

Since the probability of state transition at a certain moment only depends on its previous state , as long as we can find the transition probability between any two states in the system, the model of this Markov chain is determined.

transition probability matrix

Through the model conversion of the Markov chain, we can convert the state of the event into a probability matrix (also known as a state distribution matrix ), as shown in the following example:

In the figure above, there are two states A and B, the probability of A to A is 0.3, the probability of A to B is 0.7; the probability of B to B is 0.1, and the probability of B to A is 0.9.

  • The initial state is in A, if we calculate the probability that the state is still in A after 2 movements? Very simple:
    P = A → A → A + A → B → A = 0.3 ∗ 0.3 + 0.7 ∗ 0.9 = 0.72 P = A→A→A + A→B→A = 0.3 * 0.3 + 0.7 * 0.9 = 0.72P=AAA+ABA=0.30.3+0.70.9=0.72
  • What are the state probabilities after two movements? What if the initial and final states are unknown? This is to introduce the transition probability matrix , which can describe all the probabilities very intuitively.

    With the state matrix in hand, we can easily draw the following conclusions:
    • The initial state is A, the probability of being A after 2 movements is 0.72;
    • The initial state is A, and the probability of the state being B after 2 movements is 0.28;
    • Initial state B, the probability of state A after 2 movements is 0.36;
    • The initial state is B, the probability of the state being B after 2 movements is 0.64;
  • With the probability matrix, it is very convenient to calculate the various probabilities after n times of movement.

Let's look at a more complex case with multiple states:

Stability of State Transition Matrix

The state transition matrix has a very important characteristic. After a certain finite number of sequence transitions, a stable probability distribution can be obtained in the end , and it has nothing to do with the initial state probability distribution. For example:

Suppose the probability distribution of our current stock market is: [ 0.3 , 0.4 , 0.3 ] [0.3, 0.4, 0.3][0.30.4,0 . 3 ] , that is, a 30% probability of a bull market, a 40% probability of a bear market and a 30% probability of a sideways market. Then this state is used as the initial state t 0 t_0of the sequence probability distributiont0, bring it into this state transition matrix to calculate t 1 , t 2 , t 3 , . . . t_1,t_2,t_3,...t1,t2,t3,. . status . code show as below:

matrix = np.matrix([[0.9, 0.075, 0.025],
                    [0.15, 0.8, 0.05],
                    [0.25, 0.25, 0.5]], dtype=float)
vector1 = np.matrix([[0.3, 0.4, 0.3]], dtype=float)

for i in range(100):
    vector1 = vector1 * matrix
    print('Courrent round: {}'.format(i+1))
    print(vector1)

Output result:

Current round: 1
[[ 0.405   0.4175  0.1775]]
Current round: 2
[[ 0.4715   0.40875  0.11975]]
Current round: 3
[[ 0.5156  0.3923  0.0921]]
Current round: 4
[[ 0.54591   0.375535  0.078555]]
。。。。。。
Current round: 58
[[ 0.62499999  0.31250001  0.0625    ]]
Current round: 59
[[ 0.62499999  0.3125      0.0625    ]]
Current round: 60
[[ 0.625   0.3125  0.0625]]
。。。。。。
Current round: 99
[[ 0.625   0.3125  0.0625]]
Current round: 100
[[ 0.625   0.3125  0.0625]]

It can be found that from the 60th round, our state probability distribution has remained unchanged, and has remained [ 0.625 , 0.3125 , 0.0625 ] [ 0.625, 0.3125, 0.0625][0.625,0.3125,0 . 0 6 2 5 ] , that is, 62.5% bullish, 31.25% bearish and 6.25% sideways.

This property is not only valid for state transition matrices, but also for most other state transition matrices of Markov chain models. At the same time, not only the discrete state, but also the continuous state.

For detailed study, please refer to: https://zhuanlan.zhihu.com/p/38764470

Examples of Non-Markov Chain Processes

Only when the characteristics of the Markov chain are satisfied, it belongs to the Markov chain process. For example, for the ball-from-the-bag problem:

Obviously, the current probability of fetching the ball is not only related to the color of the ball I fetched last time, but also to the color of each ball I fetched before, so this process is not a Markov chain process.

In the case of the ball-from-the-bag problem, this sets up a Markov stochastic process.

Applications of Markov Chains in Machine Learning

Natural speech processing research allows machines to "understand" human language, and the Markov model solves it:

Language model: N-Gram is a simple and effective language model based on the independent input assumption: the appearance of the nth word is only related to the first N-1 words, and not related to any other words . The probability of occurrence of the whole sentence is the product of the occurrence probability of each word. These probabilities can be obtained by counting the number of simultaneous occurrences of N words directly from the corpus.

Acoustic model: use HMM modeling (hidden Markov model), HMM means that the internal state of this Markov model is invisible to the outside world, and the outside world can only see the output value at each moment. For speech recognition systems, the output values ​​are usually the acoustic features computed from individual frames.

reference

What is a Markov chain?

Markov Chain, the cornerstone of machine learning and artificial intelligence

What the hell is a Markov Chain?

Guess you like

Origin blog.csdn.net/a1097304791/article/details/122088595