Machine Learning: Markov Models

It will be added in the future when suitable cases are encountered.

1 Markov model

  Markov Model (MM) is a statistical model widely used in natural language processing and other fields.

1.1 Mathematical definition

  Consider a set of random variable sequences X = { X 0 , X 1 , … , X t , … } X=\{X_{0},X_{1},\dots,X_{t},\dots\}X={ X0,X1,,Xt,} , whereX t X_{t}Xtrepresents time ttt random variables, and each random variableX t X_{t}XtThe same set of values ​​is called the state space SSS S S S can be discrete or continuous.
  Suppose at time0 00 random variableX 0 X_{0}X0Follow the probability distribution P ( X 0 ) = π ( 0 ) P(X_{0})=\pi(0)P(X0)=π ( 0 ) is the initial state distribution. If a certain momentt ≥ 1 t\ge1t1 random variableX t X_{t}XtWith the random variable X t − 1 X_{t-1} at the previous momentXt1Between the conditional distribution F ( X t ∣ X t − 1 ) F(X_{t}|X_{t-1})F(XtXt1) , andX t X_{t}Xtonly depends on X t − 1 X_{t-1}Xt1, independent of past random variables ( X 0 , X 1 , … , X t − 2 ) (X_{0},X_{1},\dots,X_{t-2})(X0,X1,,Xt2) , thenXXX has the Markov property and is called a Markov chain. That is, P ( X t ∣ X 0 , X 1 , … , X t − 1 ) = P ( X t ∣ X t − 1 ) , t = 1 , 2 , … P(X_{t}|X_{0}, X_{1},\dots,X_{t-1})=P(X_{t}|X_{t-1}),t=1,2,\dotsP(XtX0,X1,,Xt1)=P(XtXt1),t=1,2,其中, P ( X t ∣ X t − 1 ) P(X_{t}|X_{t-1}) P(XtXt1) is called the transition probability distribution of the Markov chain.
  In addition, if the conditional transition probability distribution and timettt is independent, it is called a time homogeneous Markov chain. That is, P ( X t + s ∣ X t + s − 1 ) = P ( X t ∣ X t + 1 ) P(X_{t+s}|X_{t+s-1})=P(X_{t }|X_{t+1})P(Xt+sXt+s1)=P(XtXt+1)   If a certain momentt ≥ 1 t\ge1t1 random variableX t X_{t}Xtwith ex nnn states are related, it is callednnnth order Markov chain. That is, P ( X t ∣ X 0 … X t − 1 ) = P ( X t ∣ X t − n X t − n + 1 … X t − 1 ) P(X_{t}|X_{0}\dots X_ {t-1})=P(X_{t}|X_{tn}X_{t-n+1}\dots X_{t-1})P(XtX0Xt1)=P(XtXtnXtn+1Xt1)

  In addition to Markov properties, Markov chains may also have irreducible, recurrent, periodic and ergodic properties.

1.2 Two kinds of Markov chains
1.2.1 Discrete Markov chain

  If the above random variable X t ( t = 0 , 1 , 2 , … , ) X_{t}(t=0,1,2,\dots,)Xt(t=0,1,2,,) is defined in the discrete spaceSSIn S , it is called a discrete Markov chain, and its transition probability distribution can be expressed by a matrix. IfS = { 1 , 2 , … , n } S=\{1,2,\dots,n\}S={ 1,2,,n } then the transition probability distribution matrix is: P = [ p 11 p 12 … p 1 np 21 p 22 … p 2 n ⋮ ⋮ ⋯ ⋮ pn 1 pn 2 … pnn ] (1) P=\begin{bmatrix} p_{ 11} & p_{12} & \dots & p_{1n} \\ p_{21} & p_{22} & \dots & p_{2n} \\ \vdots & \vdots & \cdots & \vdots \\ p_ {n1} & p_{n2} & \dots & p_{nn} \end{bmatrix} \tag{1}P= p11p21pn 1p12p22pn 2p1np2 npnn (1)其中 p i j = P ( X t = i ∣ X t − 1 = j ) p_{ij}=P(X_{t}=i|X_{t-1}=j) pij=P(Xt=iXt1=j ) is a Markov chain att − 1 t-1t1 moment from statejjj transfers to timettstate of t iiprobability of i . pij ≥ 0 p_{ij} \ge 0pij0 ∑ i p i j = 1 \sum_{i}p_{ij}=1 ipij=1 .
  Markov chain at any timettThe state distribution of t can be given by at time t − 1 t-1t1 's state distribution and transition probability distribution, that is,π ( t ) = P π ( t − 1 ) = P ⋅ P π ( t − 2 ) \pi(t)=P\pi(t-1)=P\ cdot P\pi(t-2)π ( t )=Pπ(t1)=PPπ(t2 ) . And so onπ ( t ) = P t π ( 0 ) \pi(t)=P^{t}\pi(0)π ( t )=Pt π(0)

1.2.2 Continuous Markov chain

  If the state space SSS is defined in a continuous space, then the sequenceXXX is called a continuous Markov chain. Then the transition probability distribution is represented by the probability transition kernel function. For anyx ∈ S , A ∈ S ) x\in S, A\in S)xS,AS ) , transition probability P ( x , A ) = ∫ A p ( x , y ) dy P(x,A)=\int_{A} p(x,y)dyP(x,A)=Ap(x,y ) d y

References

  1. "Statistical Learning Methods"

Guess you like

Origin blog.csdn.net/yeshang_lady/article/details/132102565