Detailed white to white Viterbi algorithm (b)

https://blog.csdn.net/athemeroy/article/details/79342048
This paper aims to explain the same point implied shortest path problems and differences fence network Markov model and the one we mentioned, as far as possible popular but also to understand some essential formula. If you have a math phobia, please ignore all the "Notes." My school sucks probability theory, if the "Note" there are errors, make sure you spray directly, not mercy, thank you!

Hidden Markov Models (HMM) and hedges Network (God-Malaysia relations)
we do not first "stealth" Let me say what is the Markov model?
Hidden Markov model, on hearing the name, and has a hidden Markov model links.
The so-called Markov model, in fact, that is a good man, how they change the (Narrator: ???). For example, a person might one day a slight cold, it may be normal, and possibly a bad cold (Please forgive me for not very understanding of disease). If you think this is pretty random God can flip a coin, for example we can think of:

P(身体状态=正常)=0.7
P(身体状态=轻感冒)=0.2
P(身体状态=重感冒)=0.1 
  • 1
  • 2
  • 3

But more likely, the physical state of a given day is this person a long time to come ...... ah No, of day (or days N) before the change to the state body. For example, normal probability may be greater still make normal the next day, while lighter than normal cold easier for the next day showed a bad cold.
If we believe that one day the state of the body is completely determined by the physical state of the previous day (this is called a first-order Markov assumption, which can be considered one of 10000 assume that the world's most irresponsible, the good news that in many cases hypotheses quite easy to use), then we can draw such a figure it out:
image_1c6pb7e8e19tj1el1rd8lho17bim.png-99.7kB图1
a brief explanation, if someone is light cold yesterday, so today he has a 0.4 probability of maintaining the cold light of the state, the probability of the disease 0.4 improves, there are 0.2 probability of disease progression.
Emmmm ... it looks a bit messy, so if only three states Fortunately, if the state more points, I feel we should dazzled!
So generally speaking, we use a similar form below to describe the whole Markov process.

Here Insert Picture Description
This is not clear and more! So if we eliminate unnecessary words, just use a matrix to represent, then it can be to write (called the state transition matrix):

Here Insert Picture Description
Of course, since there is a first-order Markov assumption, of course, there is the second, third and even the N-order, they are hypothetical state today (t time) and only days before n (t-1, t-2 ... tn time ) state to decide. For example, if, based on second-order Markov assumption, someone the day before yesterday and yesterday are the cold, cold light, then today maybe cold will get better soon (good wishes), and today's the day before yesterday about the state and only, and then move has nothing to do with the history.

Markov models and fence network
so this time we can put an article on the network get over the fence.
image_1c6pcaud619a9cnfqharjt1h7a9.png-56.7kB图3
Let's get this picture simply change it, remove the A and E (of course, can leave A, this time is the so-called "a priori", let's not consider this), then the right node weights changed a bit, we got " saving lives fence network ":
image_1c6pchsjo1ns4qal1fg41mlh1m62m.png-44.9kB图4

"I can in the river, painting a full day chart!" Nat Pug happily said.

It is easy to find, as long as the distance between the original node, state transition into transition probability matrix, we can easily put "runners not tired fence network" to "saving lives fence network."
Then you see your teacher this time it is easy to test you, "a man known cold February 1st, February 5 Normal, told me that he went through a sequence of probability which of the state's largest!" (Narrator: God knows what he has experienced, probably eaten cold medicine now!) you'll find this and the previous article "runners not tired fence network" is almost no difference, if you insist there is anything, since one day per shift are independent, then the probability need to take together (see note), and finally see who the greatest, rather than simply "run errands" in addition.
FIG 5 image_1c6pd9qki19do1eal3gtcbj1dab13.png-42.7kB
For example, he experienced a "cold, normal, mild cold, cold, cold light, normal" (VO:! This is infirm person) the probability P = 0.2 × 0.2 × 0.2 × 0.5 × 0.4 = 0.0016. We can calculate the probability of all paths to maximize that, but we have learned on a Viterbi algorithm, you should learn to reduce a lot of paths.

注:为什么我们简单把路径上的概率乘在一起就可以呢?(一阶)马尔科夫假设事实上是这样一个东西(竖线|表示“给定,以…为条件”):image_1c6qsenns13e97qfd7k10bl1b6m9.png-1.9kB(来自维基百科),即在已经出现了前N个状态的条件下得到第N+1个状态的概率等于已经出现了第N个状态时出现第N+1个概率(所以才说“和前天无关只和昨天有关”)。而根据条件概率,我们有
P(Xn+1=轻感冒,Xn=重感冒)=P(Xn+1=轻感冒|Xn=重感冒)P(Xn=重感冒)(0)
即这个人昨天重感冒今天轻感冒的概率(联合概率)=在昨天重感冒的情况下今天轻感冒的概率(转移概率)×这个人昨天重感冒的概率。所以如果我们已知昨天这个人重感冒,那么P(Xn=重感冒)=1,这个联合概率和转移概率就相等了。 
那么如果转移两天呢?比如我们已知一个人第n天重感冒,求第n+1天第n+2天分别是轻感冒、正常的概率呢?我们尝试着把公式写出来: 
P(Xn+2=正常,Xn+1=轻感冒,Xn=重感冒)=(1)
P(Xn+2=重感冒|Xn+1=轻感冒,Xn=重感冒)×P(Xn+1=轻感冒,Xn=重感冒) 
也就是说,这三天出现“重感冒、轻感冒、正常”的概率,等于在已知前两天分别是“重感冒、轻感冒”而最后一天“正常”的概率,乘以前两天分别“重感冒、轻感冒”的概率。而根据一阶马尔科夫假设,我们知道最后一天“正常”和第一天是什么状态并没有什么关系,仅仅和第二天的状态有关,也就意味着我们发现等式右边的第一项其实与第二天到第三天的转移概率相等: 
P(Xn+2=正常|Xn+1=轻感冒,Xn=重感冒)=P(Xn+2=正常|Xn+1=轻感冒) 
所以(1)式就变为了: 
P(Xn+2=正常,Xn+1=轻感冒,Xn=重感冒)= 
P(Xn+2=正常|Xn+1=轻感冒)×P(Xn+1=轻感冒,Xn=重感冒) 
这时候再看最后一项:这个我们熟,在上面(0)呢: 
P(Xn+1=轻感冒,Xn=重感冒)=P(Xn+1=轻感冒|Xn=重感冒)P(Xn=重感冒)

所以最后(1)式就变为: 
P(Xn+2=正常,Xn+1=轻感冒,Xn=重感冒)= 
P(Xn+2=正常|Xn+1=轻感冒)P(Xn+1=轻感冒|Xn=重感冒)P(Xn=重感冒) 
也就是说,如果我们知道了第一天“重感冒”的概率P(Xn=重感冒),想要得到接下来两天“轻感冒、正常”的概率,我们只需要把相应的转移概率乘在上面就可以了。所以说如果给定了马尔科夫链的状态序列,我们就从头到尾把所有的转移概率乘在一起就得到了最终这条状态序列出现的概率。
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18

What is the hidden Markov model?

Markov model introduction is over, then what is the hidden Markov model?
Let's give another chestnut, a small town medical standards, do not measure a person in the end is normal, or mild flu, or a bad cold. Barefoot doctors of their town can only see the people in the end are no symptoms, cough, fever or diarrhea (Aside: diarrhea is how to look out!). In this example we can think of these four (jump, cough, fever, diarrhea) is the outward manifestation, and normal and cold, and so are called internal state (in fact, the "output" state). We have no way to directly observe the internal state, because they are hidden (not medical devices), but we can directly observe the external manifestations.
Assuming that the patient and his only outward manifestation of the day about the internal state, and other conditions unrelated (this is called independent outputs assumptions), the relationship in the following table (called the matrix corresponding emission matrix, probably because the outside is emitted from the internal state in the performance (output) it):
FIG 7 image_1c6pe0ra54kh1i0s1rsg1jaqo821t.png-38.1kB
we can infer his inner state by observing his outward manifestation. For example, if a person is alive and kicking, common sense you should infer that he is normal (in fact is based on maximum likelihood, we should first do not mention), but if you tell the person on February 1 to No. 5 are diarrhea, cough, fever , cough, jump (this sequence is called the output sequence), allows you to write the most likely sequence of states, which is what his inner state every day are, it will be a lot bigger on the difficulty.

Hidden Markov model and fence network

In fact, under the weight we do not consider the path of cases, the whole issue and 4 network makes no difference. Every day, may be one of three states (of course, from the emission matrix of view, if the third day of the fever, the third day is unlikely to be normal, why?), And every day the state to follow first-order Markov Suppose, just about the status and the previous day, the entire diagram is the node no difference.

But the probability (right node weights) are different. Let's give an example just mentioned: the third day of fever, the second day of a node pointing to any third probability "normal" state of this should all be 0: normal because this state emission probabilities of fever is 0 . Therefore, although the state at time t, t-1 status and only time related, but since given output sequence, every moment of output gave us more information about the status of the day.
As a more simple example:

已知某人第一天轻感冒,第二天发烧,求第二天最可能的内在状态。
  • 1

On the surface this: from cold to light the next day the first day of the three states, only the corresponding transition probabilities:
Figure 8 Figure 8
it can in fact be considered from this: the first day of a cold light, it becomes the next day after the transfer there are three possible states, and each state will launch according to their different emission probabilities out of four different types of external performance:
image_1c6phanou19c35 cold light 4f9rk71cqs4e.png-44.6kB FIG. 9
devolved into this: because this man is the outward manifestation of fever the next day, so the other outside the performance is not considered
image_1c6pfu5vhffd1dp416k11mufvf134.png-24.5kB图10
but please note that this is not a true state transition, because of the inherent state is still just about the status and the day before, only because to some new additional conditions, we can be certain internal estimate of the state "fudge."

Note: Here 0.4 × 0.2 = 0.08 is in fact the joint probability P (Xn-1 = mild cold, Xn = cold light, Yn = fever) (2)
= P (= Yn of fever | Xn-1 = mild cold, Xn = light cold) P (Xn-1 = mild cold, Xn = mild cold) and as an independent output hypothesis, the n-th day the external manifestations of "fever", only, and the n-th days of states associated, regardless of the day before the state : P (Yn = fever | Xn-1 = mild cold, Xn = light colds) = P (Yn = fever | Xn = mild cold)
so that the final (2) on the performance of P (Xn-1 = mild cold, Xn = mild cold, Yn = fever)
= P (Yn = fever | Xn = mild cold) P (Xn-1 = mild cold, Xn = mild cold) one we already discussed many times, by transferring probability is easy to calculate, and the former one is the internal state of the external manifestations of emission probabilities.

So in the end the weight becomes this:
image_1c6pg fever 8v1r7lffa1lttshthfu3h.png-17.4kB 11
As can be seen from the chart, the next internal state of maximum probability cold light (0,0.08,0.06).
Careful small partner can see three paths together is not equal to one. Indeed, because this is the joint probability of the transition probability than the original "more than one condition," you need to be treated using the Bayesian formula "normalized", but because we only need to compare the probability of which path is more large, so we simply multiply them together is also nothing wrong.

Note: In theory, we should in fact be the path marked "the first day of a known cold light, under the conditions given fever the next day, the next day seeking internal state is the probability of cold light" So in fact a conditional probability:
P (Xn = light cold | Yn = fever, Xn-1 = mild cold) and not just the joint probability. According to Bayes' formula we have P (Xn = light cold | Yn = fever, Xn-1 = mild cold)

= P (Xn-1 = mild cold, Xn = light colds, Yn = fever) P (Yn = fever, Xn-1 = mild cold)
= P (Yn of = fever | Xn-1 = mild cold, Xn = light colds ) P (Xn-1 = mild cold, Xn = mild cold) P (Yn = fever, Xn-1 = mild cold)
= P (Yn of = fever | Xn = mild cold) P (Xn-1 = mild cold, Xn = light cold) P (Yn = fever, Xn-1 = mild cold) and scores following probability (normalization coefficient), and outputs only the internal state of the day concerned, and the previous day regardless of the status
= P (Yn = fever | Xn = mild cold) P (Xn-1 = mild cold, Xn = mild cold) P (Yn = fever) P (Xn-1 = mild cold) The total probability formula
= P (Yn = fever | Xn = light colds ) P (Xn-1 = mild cold, Xn = cold light) ΣXnP (Yn = fever | Xn) P (Xn-1 = mild cold) see this is the normalized coefficient of each path and it always He is a constant. So which path if we consider only the largest, if normalization is actually not very different.
Meanwhile, tagging issues, the hidden Markov model obtained is a generative model, we should ask itself is the joint probability of the entire path, so consider this fact, no major significance.

Next Article:
Why question mark can be seen as the Hidden Markov Models?

  • n-gram language model
  • Not a digression digression: Pinyin input method
  • Dimensioning
    Witt how to solve the problem than tagging algorithm?

Reference:
http://www.52nlp.cn/hmm-learn-best-practices-two-generating-patterns

Guess you like

Origin www.cnblogs.com/jfdwd/p/11102071.html