Hidden Markov Model HMM (Hidden Markov Model)

Undergraduate school three or four times the HMM, machine learning lesson lessons natural language processing, Chinese information processing class; now Graduate natural language processing, and ran into this old acquaintance;

Although many encounter, but always felt a little knowledge, their understanding is not comprehensive enough, through this opportunity, I want to get directly to this famous model, after the province also encountered bother.

Outline

  • Introduction and Background model
    • Start from the probability map
    • Bayesian networks, Markov models, Markov process, Markov networks, CRFs
  • HMM formal representation
    • Markov Model of formal representation
    • HMM formal representation
    • The two basic assumptions HMM
  • Three basic questions of HMM
  • Evalution
  • Learning
  • Decoding
  • Case

Notes

I. Introduction and Background model

1.1 start from the probability map

  In the face of a complex problem, "map" is an effective tool, using only the points and lines which can express complex association between the entities and the constraints, if attached to the edge of the associated entities probability, it is further more the logical expression of the strength of ties and relationships between entities.

  Specific to the field of machine learning, a probabilistic graphical model is represented by FIG theoretical characteristics and dependencies between classes, characteristics, and features, the category and category. FIG using it to mean " the joint probability model variables associated with the distribution of " Essentially, it is a formula model (Generative Model) .

  In the event of a real problem, probabilistic graphical model indicates that the observed data with the observation nodes represent potential knowledge with hidden nodes, with each side to describe the relationship between knowledge and data, and finally get this graph based on a probability distribution, good access to the knowledge hidden in the data.

  Probability map points are divided into nodes and hidden nodes observation, it is also divided into edge to edge and undirected edges. Depending on the sides, probabilistic graphical models can be divided into Bayesian networks and Markov network into two categories.

  And before we introduce HMM, will sort out its related concepts, easy to distinguish.

1.2 Bayesian networks, Markov models, Markov process, Markov random field, CRFs

  Concepts related to Markov There are many, in the previous study, are also fragmented knowledge, today to explore a probability map from the logical chain starting, good will integrate this knowledge together, everyone from top to read the next turn, you should be able to understand:

  1. The node as a random variable, if the two random variables are not independent, then both connected to one side; if given a number of random variables, is formed to have a probability map .
  2. If the network is a directed acyclic graph, then this network is called Bayesian network.
  3. If the linear chain of FIG degenerate manner, to give the Markov model ; since each node is a random variable, each of which is associated changes as the time (or space), the stochastic process perspective, it can be seen as a Markov process . If the transition of each state are dependent on n states prior thereto, then the process is called an N-order Markov process . The most typical example is the introduction of [n-gram Markov assumption , i.e. each state of the model depends on only the state before it is N-1 order Markov process].
  4. But Markov model can not accurately describe the model we deal with the problem, such as every morning, we can [by] the number of pedestrian clothes (observation status) to determine the temperature [today] (implicit state) is the number, this Markov process with unknown parameters is called the hidden Markov model (HMM).
  5. If the network is undirected, undirected graph is a model, known as MRF or Markov network .
  6. If the premise given certain conditions, to study the MRF, the resulting conditional random field (CRF) . Note: CRF is utilized MRF (undirected graph), and the HMM is based on Bayesian network (directed graph), the same basic problem both substantially similar calculation method, but the basic idea is different of.
  7. If conditional random labeling solve the problem, and with the further condition of the network topology becomes linear airport, is obtained linear chain CRFs .
  About the relationship between system concepts, refer to the relevant video B station master shuahuai008 the UP , I jio and said very good!
 

 Two, Hidden Markov Model of formal representation

  In the above, we have the relevant concepts are systematic exposition, but we still need to focus on the concept of which is further explained in this section, we will look further and Hidden Markov Model model.

Formal 2.1 Markov model representation

  A Markov model is a triple (S, [pi, A), where S is the set of states, [pi is the probability of the initial state, A is the transition probability between states, which will be formalized in the specific meaning of the HMM representation are introduced, only one instance of this in this on display:

 

 

2.2 Formal representation of HMM

  • Provided $ \ mathbb {Q} = \ left \ {\ mathbf {q} _ {1}, \ mathbf {q} _ {2}, \ cdots, \ mathbf {q} _ {Q} \ right \} $  is the set of all possible states, i.e., the state variable value space; $ \ mathbb {V} = \ left \ {\ mathbf {V} _ {}. 1, \ mathbf {V} _ {2}, \ cdots, \ mathbf {v} _ {V} \ right \} $ is the set of all possible observations, namely the observed variable the value space. Wherein the subscript Q is the number of possible states, the subscript V is the number of possible observations, in general Q = V.
  • Provided $ \ mathbf {I} = \ left (i_ {1}, i_ {2}, \ cdots, i_ {T} \ right) $ of length T state sequence, $ \ mathbf {O} = \ left ( o_ {1}, o_ {2 }, \ cdots, o_ {T} \ right) $ is the corresponding observation sequence.
    • $ i_ {t} \ in \ {1, \ cdots, Q \} $ is a random variable that represents the state variable $ \ mathbf {q} _ { i_ {t}} $
    • $ o_ {t} \ in \ {1, \ cdots, V \} $ is a random variable that represents the observed variables $ \ mathbf {v} _ { o_ {t}} $
  • Provided $ \ mathbf {A} $ is the state transition probability matrix,

 $\mathbf{A}=\left[\begin{array}{cccc}{a_{1,1}} & {a_{1,2}} & {\cdots} & {a_{1, Q}} \\ {a_{2,1}} & {a_{2,2}} & {\cdots} & {a_{2, Q}} \\ {\vdots} & {\vdots} & {\vdots} & {\vdots} \\ {a_{Q, 1}} & {a_{Q, 2}} & {\cdots} & {a_{Q, Q}}\end{array}\right]$

   Wherein, $ a_ {i, j} = P \ left (i_ {t + 1} = j | i_ {t} = i \ right) $, represents the time t is $ \ mathbf {q} _ { i} $ under a state in time t + 1 is transferred to $ \ mathbf {q} _ { j} $ state probability.

  • Provided $ \ mathbf {B} $ emit matrix, also called the observation probability matrix,

$\mathbf{B}=\left[\begin{array}{cccc}{b_{1}(1)} & {b_{1}(2)} & {\cdots} & {b_{1}(V)} \\ {b_{2}(1)} & {b_{2}(2)} & {\cdots} & {b_{2}(V)} \\ {\vdots} & {\vdots} & {\vdots} & {\vdots} \\ {b_{Q}(1)} & {b_{Q}(2)} & {\cdots} & {b_{Q}(V)}\end{array}\right]$

   Wherein $ b_ {j} (k) = P \ left (o_ {t} = k | i_ {t} = j \ right) $, represents the time t is $ \ mathbf {q} _ { i} $ state under conditions to produce the observed variables $ \ mathbf {v} _ { k} $ probability.

  • Provided $ \ vec {\ pi} = \ left (\ pi_ {1}, \ pi_ {2}, \ cdots, \ pi_ {Q} \ right) ^ {T} $ is the initial state probability distribution, where $ \ pi_ {i} = P \ left ( i_ {1} = i \ right) $ $ t = 1 is the state in which $ $ \ mathbf {q} _ { i} $ probability.

 After defining the above content, we can define a formal HMM model, i.e. represented by a five-tuple $ \ lambda = (\ mathrm {Q}, \ mathrm {V}, \ mathrm {\ Pi}, \ mathrm {a}, \ mathrm {B}) $, where, $ \ mathbf {a}, \ mathbf {B}, \ vec {\ pi} $ three elements called hidden Markov model:

  • State transition probability matrix $ \ mathbf {A} $ , and the initial state probability vector $ \ vec {\ pi} $ determined hidden Markov chain, generating unpredictable state sequence.
  • Observation matrix matrix $ \ mathbf {B} $ determine how to generate the observed variables from the state variables, and the state sequence $ \ mathbf {I} $ together define how to generate the observation sequence.

Chain of two Hidden Markov Model 2.3 Basic assumptions

  • Homogeneous Markov assumption, also known as limited historical assumption that the hidden Markov chain in the state any time it depends only on the previous time the state has nothing to do with the state and other observations, namely:

$P\left(i_{t} | i_{t-1}, o_{t-1}, \cdots, i_{1}, o_{1}\right)=P\left(i_{t} | i_{t-1}\right), \quad t=1,2, \cdots, T$

  • Observation independence assumption, also known as time invariance assumption that the observed value at any point depends only on the time of hidden states , irrespective of the state and other observations, namely:

$P\left(o_{t} | i_{T}, o_{T}, \cdots, i_{t-1}, o_{t+1}, i_{t}, i_{t-1}, o_{t-1}, \cdots, i_{1}, o_{1}\right)=P\left(o_{t} | i_{t}\right), \quad t=1,2, \cdots, T$

 Third, the three basic questions of hidden Markov

 Hidden Markov model can solve three basic questions:
  • Probability calculation, or called assessment (Evoluation):
    • Given model $ \ lambda = (\ mathrm { Q}, \ mathrm {V}, \ mathrm {\ Pi}, \ mathrm {A}, \ mathrm {B}) $ , and the observation sequence $ \ mathbf {O} = \ left (o_ {1}, o_ {2}, \ cdots, o_ {T} \ right) $, calculating observation sequence $ \ mathbf {O} $ probability of $ P (\ mathbf {O} | \ lambda) $
    • I.e., prior to the algorithm used to evaluate the model $ \ $ the lambda and the observed sequence $ \ mathbf {O} $ degree of match
    • For example: Given a weather on hidden Markov models, comprising a first weather day, the weather transition probability matrix, the specific humidity weather leaves probability distribution. Seeking a humidity leaves the first day, the next day humidity 2, 3 of the third probability humidity
  • Model building problem, namely learning problems (Learning):
    • Known sequence of observations $ \ mathbf {O} = \ left (o_ {1}, o_ {2}, \ cdots, o_ {T} \ right) $, model evaluation obtained $ \ lambda = (\ mathrm { Q}, \ mathrm {V}, \ mathrm {\ Pi}, \ mathrm {a}, \ mathrm {B}) $ parameters, so that observation sequence probability under the model $ P (\ mathbf {O} ; \ lambda) $ maximum.
    • That is, using maximum likelihood estimation (EM algorithm) to estimate the parameters
    • For example: The first day of known humidity leaves 1, the next day humidity 2, third 3 humidity. Obtain a weather hidden Markov model, including a first weather day, the weather transition probability matrix, the specific humidity weather leaves probability distribution.
  • Implicit state to solve the problem, that problem decoding (Decoding):
    • Given model $ \ lambda = (\ mathrm {Q}, \ mathrm {V}, \ mathrm {\ Pi}, \ mathrm {A}, \ mathrm {B}) $, and the observation sequence $ \ mathbf {O} = \ left (o_ {1}, o_ {2}, \ cdots, o_ {T} \ right) $, the conditional probability of seeking a given observation sequence of $ P (\ mathbf {I} | \ mathbf {O}) the maximum state variables $ sequence $ \ mathbf {I} = \ left (i_ {1}, i_ {2}, \ cdots, i_ {T} \ right) $.
    • That state of a given observation sequence, for the most likely corresponding sequence.
    • For example: In the speech recognition task, the speech signal is observed, it is hidden text. Target decoding problem is this: according to the voice signal observations to infer the most likely sequence of words.
 

 Four, Evolution

  Given model $ \ lambda = (\ mathrm { Q}, \ mathrm {V}, \ mathrm {\ Pi}, \ mathrm {A}, \ mathrm {B}) $ , and the observation sequence $ \ mathbf {O} = \ left (o_ {1}, o_ {2}, \ cdots, o_ {T} \ right) $, calculating observation sequence $ \ mathbf {O} $ probability of $ P (\ mathbf {O} | \ lambda) $

4.1 Naive Approach

  The most direct way is directly calculated according to the probability equation: by enumerating all possible lengths of T's in the sequence of states $ \ mathbf {I} = \ left (i_ {1}, i_ {2}, \ cdots, i_ {T} \ right) $, seeking each state sequence $ \ mathbf {I} $ with the observed sequence $ \ mathbf {O} = \ left (o_ {1}, o_ {2}, \ cdots, o_ {T} \ right) $ joint probability $ P (\ mathbf {O}, \ mathbf {I} | \ lambda) $, $ P and then summing all the possible states (\ mathbf {O} | \ lambda) $

   The calculation process is as follows:

Directly above calculation method, the time complexity is too large, only feasible in theory, can not be put into real use

 

 4.2 before the algorithm

  Forward nature of the algorithm is the introduction of the idea of ​​dynamic programming, the calculated observation sequence $ \ mathbf {O} = \ left (o_ {1}, o_ {2}, \ cdots, o_ {T} \ right) $ appears the probability of occurrence probability is converted to $ O_ at $ t = 1 $ {1} $ a $ \ times $ $ o_ $ 2 occurs when the $ t = {2} $ a $ \ times $ ...... $ \ times $ in $ t = {T} of the probability of $ $ O_ when T $.
   因此,定义前向概率,即:在时刻$t$的观测序列为$o_{1}, o_{2}, \cdots, o_{t}$,且隐状态为$q_{j}$的概率 ,记作:$\alpha_{t}(i)=P\left(o_{1}, o_{2}, \cdots, o_{t}, i_{t}=i ; \lambda\right)$;
  而如果我们能从初始状态 递推得出最后一个时刻的前向概率,就可以估计出 观测序列为$O$时的状态变量。
     根据定义, $\alpha_{t}(i)=P\left(o_{1}, o_{2}, \cdots, o_{t}, i_{t}=i ; \lambda\right)$是在时刻$t$观察到$o_{1}, o_{2}, \cdots, o_{t}$,且隐状态为$q_{j}$的概率。
  如果 前向概率$\alpha_{t}(i)=P\left(o_{1}, o_{2}, \cdots, o_{t}, i_{t}=i ; \lambda\right)$$\times$状态转移概率$\alpha_{t}(j)$ ,则表示在观测变量不变的前提下,在时刻$t$的隐状态为$q_{j}$,且在时刻$t+1$的隐状态为$q_{i}$的概率。
  

 

 

 
 
 
 
 
 
 
 
 
 
 
 
 
 

Guess you like

Origin www.cnblogs.com/hithongming/p/12083836.html