Maximum Entropy Markov Models (MEMM)

 

definition:

MEMM is a probability model, i.e., at a given observation state and the previous state, the current state of the occurrence probability.

 

 

A finite set of states represented by Ø S

Ø O represents observation sequence set

Ø Pr (s | s', o): state transition probability matrix and observed

Ø initial state distribution: Pr0 (s)

 

Legend: O represents a set of observations, S represents the set of states, M is the model

 

Maximum Entropy Markov Models (MEMM) drawbacks:

FIG facie, to find a hidden state and observation state O S S most likely sequence:

Path: s1 s1--s1-s1 probability: 0.4 * 0.5 * 0.45 = 0.09

s2 s2--s2-s2 path probability: 0.2 * 0.3 * 0.3 = 0.018

s1 s2--s1-s2 path probability: 0.6 * 0.2 * 0.5 = 0.06

s1 s1--s2-s2 path probability: 0.4 * 0.55 * 0.3 = 0.066

Whereby the optimal path can be obtained as s1-s1-s1-s1

In fact, in the figure above, tend to shift to the state 1 state 2, and the state 2 Total tend to stay in the state 2, which is called the label bias problem, due to the different number of branches, the probability distribution is not balanced, resulting in state the existence of an unfair situation metastasis.

From the above two figures, maximum entropy hidden Markov model (MEMM) can only reach a local optimal solution, but can not achieve global optimal solution, therefore MEMM solves the problem of HMM output independence assumption, but there label bias problem.

 

MEMM marked bias issue
As shown, "because" it is a preposition part of speech p, and MEMM but which speech is mislabeled conjunction c. Possible causes of this situation is the one kind of bias problem.
Reason: "YES" in the presence of two speech, pronouns of v and r, are contained in the state set S1; "as" includes two parts of speech, prepositions and conjunctions C p, contained in a set state S2; "event" is only a part of speech noun n, S3, contained in the state set. Since a MEMM are as defined exponential model for each state, therefore: P (n | p) = 1, P (n | c) = 1, P (p | S1) + P (c | S1) = 1; Based Markov assumptions,
P (S1, p, n) = P (p | S1) * P (n | p) = P (p | S1), Similarly, P (S1, c, n ) = P (c | S1) * P (n | c) = P ( c | S1). S2 thus choose p depends only on the node or node c P (p | S1), P (c | S1), that is, only with "YES" context-sensitive, and "as" context-free , even if it is generating a bias MEMM case.

Guess you like

Origin www.cnblogs.com/shona/p/11415130.html