It will be added in the future when suitable cases are encountered.
1 Markov model
Markov Model (MM) is a statistical model widely used in natural language processing and other fields.
1.1 Mathematical definition
Consider a set of random variable sequences X = { X 0 , X 1 , … , X t , … } X=\{X_{0},X_{1},\dots,X_{t},\dots\}X={
X0,X1,…,Xt,…} , whereX t X_{t}Xtrepresents time ttt random variables, and each random variableX t X_{t}XtThe same set of values is called the state space SSS。 S S S can be discrete or continuous.
Suppose at time0 00 random variableX 0 X_{0}X0Follow the probability distribution P ( X 0 ) = π ( 0 ) P(X_{0})=\pi(0)P(X0)=π ( 0 ) is the initial state distribution. If a certain momentt ≥ 1 t\ge1t≥1 random variableX t X_{t}XtWith the random variable X t − 1 X_{t-1} at the previous momentXt−1Between the conditional distribution F ( X t ∣ X t − 1 ) F(X_{t}|X_{t-1})F(Xt∣Xt−1) , andX t X_{t}Xtonly depends on X t − 1 X_{t-1}Xt−1, independent of past random variables ( X 0 , X 1 , … , X t − 2 ) (X_{0},X_{1},\dots,X_{t-2})(X0,X1,…,Xt−2) , thenXXX has the Markov property and is called a Markov chain. That is, P ( X t ∣ X 0 , X 1 , … , X t − 1 ) = P ( X t ∣ X t − 1 ) , t = 1 , 2 , … P(X_{t}|X_{0}, X_{1},\dots,X_{t-1})=P(X_{t}|X_{t-1}),t=1,2,\dotsP(Xt∣X0,X1,…,Xt−1)=P(Xt∣Xt−1),t=1,2,…其中, P ( X t ∣ X t − 1 ) P(X_{t}|X_{t-1}) P(Xt∣Xt−1) is called the transition probability distribution of the Markov chain.
In addition, if the conditional transition probability distribution and timettt is independent, it is called a time homogeneous Markov chain. That is, P ( X t + s ∣ X t + s − 1 ) = P ( X t ∣ X t + 1 ) P(X_{t+s}|X_{t+s-1})=P(X_{t }|X_{t+1})P(Xt+s∣Xt+s−1)=P(Xt∣Xt+1) If a certain momentt ≥ 1 t\ge1t≥1 random variableX t X_{t}Xtwith ex nnn states are related, it is callednnnth order Markov chain. That is, P ( X t ∣ X 0 … X t − 1 ) = P ( X t ∣ X t − n X t − n + 1 … X t − 1 ) P(X_{t}|X_{0}\dots X_ {t-1})=P(X_{t}|X_{tn}X_{t-n+1}\dots X_{t-1})P(Xt∣X0…Xt−1)=P(Xt∣Xt−nXt−n+1…Xt−1)
In addition to Markov properties, Markov chains may also have irreducible, recurrent, periodic and ergodic properties.
1.2 Two kinds of Markov chains
1.2.1 Discrete Markov chain
If the above random variable X t ( t = 0 , 1 , 2 , … , ) X_{t}(t=0,1,2,\dots,)Xt(t=0,1,2,…,) is defined in the discrete spaceSSIn S , it is called a discrete Markov chain, and its transition probability distribution can be expressed by a matrix. IfS = { 1 , 2 , … , n } S=\{1,2,\dots,n\}S={
1,2,…,n } then the transition probability distribution matrix is: P = [ p 11 p 12 … p 1 np 21 p 22 … p 2 n ⋮ ⋮ ⋯ ⋮ pn 1 pn 2 … pnn ] (1) P=\begin{bmatrix} p_{ 11} & p_{12} & \dots & p_{1n} \\ p_{21} & p_{22} & \dots & p_{2n} \\ \vdots & \vdots & \cdots & \vdots \\ p_ {n1} & p_{n2} & \dots & p_{nn} \end{bmatrix} \tag{1}P=
p11p21⋮pn 1p12p22⋮pn 2……⋯…p1np2 n⋮pnn
(1)其中 p i j = P ( X t = i ∣ X t − 1 = j ) p_{ij}=P(X_{t}=i|X_{t-1}=j) pij=P(Xt=i∣Xt−1=j ) is a Markov chain att − 1 t-1t−1 moment from statejjj transfers to timettstate of t iiprobability of i . pij ≥ 0 p_{ij} \ge 0pij≥0且 ∑ i p i j = 1 \sum_{i}p_{ij}=1 ∑ipij=1 .
Markov chain at any timettThe state distribution of t can be given by at time t − 1 t-1t−1 's state distribution and transition probability distribution, that is,π ( t ) = P π ( t − 1 ) = P ⋅ P π ( t − 2 ) \pi(t)=P\pi(t-1)=P\ cdot P\pi(t-2)π ( t )=Pπ(t−1)=P⋅Pπ(t−2 ) . And so onπ ( t ) = P t π ( 0 ) \pi(t)=P^{t}\pi(0)π ( t )=Pt π(0)
1.2.2 Continuous Markov chain
If the state space SSS is defined in a continuous space, then the sequenceXXX is called a continuous Markov chain. Then the transition probability distribution is represented by the probability transition kernel function. For anyx ∈ S , A ∈ S ) x\in S, A\in S)x∈S,A∈S ) , transition probability P ( x , A ) = ∫ A p ( x , y ) dy P(x,A)=\int_{A} p(x,y)dyP(x,A)=∫Ap(x,y ) d y
References
- "Statistical Learning Methods"